


dos 


Gin 


tor 
Beke: 
bBul 
-_ 
istics 
Atio 
1941 


mor 
D.) 
hoo! 


fefh 
lum 
Ss 


m 
tive 


ical 


, in 
Par 
Na 
7 





ne vom 


JOURNAL of EDUCATIONAL 
RESEARCH 





Volume XXXVI MARCH, 1943 Number 7 





\ PRACTICAL PROCEDURE FOR THE RIGOROUS INTERPRE- 
TATION OF TEST—RETEST SCORES IN TERMS OF 
PUPIL GROWTH* 


H. C. TRIMBLE 
lowa State Teachers College 
and 
Les J. CRONBACH 
State College of Washington 


Editor's note: The interpretation of growth scores is a complicated 
hnical problem and very inadequately handled in many instances. The 
t ‘ffers new material in an important area 
DRAKE? has recently called attention to the fact that students with 
nitial test scores tend to make greater gains than pupils with high initial 
res. While statistical techniques are available for studying the growth of 
ls in formal experiments, present methods are not well adapted to the 
is of the classroom teacher who is trying to decide how satisfactory the 
wth of his own particular students has been. This article describes a 
ple procedure for analysing test scores which has been fourid effective in 
1ating pupil growth; this procedure retains the advantages to be gained 
using standardized tests but guards against the implication that a 
tain level of achievement is appropriate for every pupil regardless of his 
vious attainment. 
Data on individual growth are available whenever pupils can be tested 
ervals with the same test or with equivalent tests. The simple proce 
of estimating growth by subtracting the initial score from the final 


* Methods essentially similar to those described in this paper were used by H. C 
and Anne Trimble in making some interpretations of test results for co 
tion Staff of the Eight-Year Study. The refinement of this original work and 
uper here presented were principally the work of Lee J. Cronbach 
Charles A. Drake, “The Iota Function’, Journal of Educational Research, 
XXIV (November, 1940), pp. 190-198 


481 


482 JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, No.7 


score for each pupil is superficial in that it fails to acknowledge the possible 
effects of errors of measurement: these errors are especially serious in view 
of the factor of regression which has confused many recent studies of psy- 
chological growth. If a student’s measured score on a given test is above 
average, it is likely that his “true score” is smaller than this measured score; 
in other words, if one tested this student several times under conditions 
which allowed no chance for change in the abilities being tested, his aver. 
age score would probably be lower than this first score.* Similarly, the esti- 
mate made for the student whose score is below the mean is apt to be lower 
than his true ability. If such a student, apparently “poor” on the initial 
test, were retested the next day, his score would probably be better than 
before, due merely to errors of regression. Clearly, if a student’s score im- 
proves or declines and especially if the score of a “poor” student improves 
or that of a “good” student declines, over the course of a school year, one 
cannot confidently decide how much of the change is due to growth and 
how much due to errors of measurement unless a more refined procedure is 
adopted. 

A second difficulty with overly simplified methods of determining 
growth is that even if one were sure that a given student had actually gained 
ten points in ability, this fact alone would not be meaningful. First, one 
would have to know how much of this improvement is due to experience 
gained from the first testing—the so-called practice effect. Second, one needs 
to know how the growth of the student compares with the growth of other 
students in his class and with the growth of students in other groups. The 
answer to this second query is often approached through standard test norms, 
by means of which a pupil's status, but not his growth, may be compared 
to that of a large group of pupils. 

Perhaps the simplest way to compare the growth of different students 
is to obtain by subtraction the gain of each student, to average these values 
to determine the mean gain for the group, and to compare individual gains 
with the mean gain. Such a process of course neglects regression errors, but 
these may compensate. In addition, it is a misleading method because a 
growth of ten points on a test may not mean the same when the change 1s 


from thirty fo forty as when the change is from eighty to ninety. If, for 


* Horn, Alice McAnulty, Uneven Distribution of the Effects of Specific Factor 
pp. 12—14. Southern California Education Monographs, No. 12. Los Angeles: Uni 
versity of Southern California Press, 1941. 


— 


ee ee eee ee ey ee Cae eee 


-_ 
OR ty Oe 


a 


alli 














Ossible 

N view 

of psy- 

above 

score: 

ditions 

$ aver- 

le esti- 

lower 

initial wi 

+ than ; 

re im 

roves xe 
’ - 

r, one 7 

h and 


ure is Pid 


ining 4 
ained 3 

one 
1ence 
needs 
other 

The 


fms, 


wt a th 


Yared a 


sii ey 


Jents 


alues : 
rains 


but 


= 
Oe mde Ft 


“tor 


Uni 








143) INTERPRETATION OF TEST-RETEST SCORES 483 


mple, one student answers all but ten items out of one hundred correctly 
n the first testing, an improvement of ten points on a retest requires that 
his knowledge be perfect and that he avoid any errors due to carelessness 
and similar extraneous factors. It is impossible for him to make more than 
ten-point increase; if he improved from ninety to one hundred, and if the 
sverage gain for the class were twelve points, the superficial procedure 
would report that he had gained less than other students. Hence this “ceiling 
effect’’ is another factor which may make it easier for the poor student to 
nprove his score than for the average or good student to improve an equal 
mber of points, 
A graphic procedure for comparing the pupil's initial and final scores 
been found satisfactory in reducing these difficulties of interpretation. 
While it is based on statistical methods, no special knowledge of statistics 
required for a teacher to use this method, once a chart has been prepared 
a test. In this discussion, the use of the chart will be explained first; 
uiled steps to be followed in the preparation of the chart will then be 
sented 


USE OF CHARTS 


Figure 1 shows a chart which was designed to be used to interpret 
wth from the tenth to eleventh grade in the behavior measured by the 
re Ratio® on tests 1.41 and 1.42 published by the Progressive Education 

Association. This chart was selected because it illustrates especially well the 

procedure to be used; it is not necessarily typical, as the slope and the curva 

ture of the lines in charts for various tests and groups of pupils vary widely 

This chart. was prepared for experimental purposes, and was based on a 

sample of one hundred and thirty pupils from the Thirty Schools in the 

Eight-Year Study of the Progressive Education Association. The norms in 
ated here are not intended to be representative. 

The scores of students on two similar tests taken ome year apart are 

plotted on the chart, a student's score on the first test (1.41) being plotted 
n the horizontal scale and his score on the retest (1.42, a parallel form) 
ing plotted on the vertical scale. A code number or letter is usually placed 
1 the chart to indicate the position of each student. The letter A on this 


The score known as “Ratio” is an index of the student’s tendency to use many 
ons to support his decisions on social problems; a high score indicates use of many 
ons. Tests of this sort have been described in The Social Studies in General Fdu 

(New York: Appleton—Century, 1940) and elsewhere 





is4 JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, No.7 


hart indicates the plotted position of a student who made a score of 4.7 on 

the first test and 6.0 on the second. Similarly, point B represents scores of 
3.0 and 3.2, and point C represents scores of 4.0 and 4.5. Line (1) indicates 
the score which a student might be expected to make when retested if he 
has made no growth between testings; thus, if a student has a score of 5.0 
on the first test and were retested before his behavior had changed, his most 
probable score would be 4.7. This procedure acknowledges the effect of 
rrors of measurement. Since one does not know exactly what the student's 
ibility is at the time of either the first or second test, it is impossible to 
take precise judgments about the growth of individuals: allowance for in- 
curacy is made using lines (2) and (3). By noting the position where a 
tudent’s scores are plotted, we may make the following interpretations: 

(a) If the point falls above line (1), there is a probability that the 
tudent has improved since the first testing; if the point Pils above line (2) 
e are five chances in six that the change in score represents true growth 
in ability.‘ 

(b) If the point falls below line (1), there is greater than a fifty per 
cent chance that this change represents a true decline in ability; if the point 
falls below line (3), there are five chances in six that the change represent 

te ] 


aeciine 


ther 


Using these rul 


that pupils A and C probably have improved since the first test in the be 


havior being measured; that is, the change in their scores may represent 


es and the points plotted on the chart, one may conclude 


tual gain rather than errors of measurement, regression, and the like 
Although pupil B has gain d slightly in raw score, it is nevertheless probabk« 
that he made no gain or actually declined in ability, as the typical student 
with a score of 3.0 would make a greater gain than he did due to regression 
factors alone. Similarly, pupil D is so close to line (1) that the most rea 
onable judgment is that his behavior has made little change. Not only has 
pil A apparently improved; his improvement is so great (above line (2) ) 
hat it is highly improbable that it is due to chance factors and errors o! 

surement. In this sense, his growth may be considered significant. 


Lines (4), (5), and (6) are used to compare the growth of students 
with the growth made by similar students over a one-year period. Thes 


*This c be expressed alternatively: If the point falls above line (2), this 
n score would arise entirely due to chance and errors of measurement 











4.7 on 
ores of 
idicates 
d if he 
of 5.0 
is most 
fect of 
udent's 
ible to 
for in- 
here a 
ns: 


at the 
e (2) 
rowth 


ty per 


point 


esents 


clude 
ie be 
esent 
like 
bable 
ident 
ssion 
rea 
y has 
(2)) 


Ss OI 


lents 


hese 


this 
nent 








1943] INTERPRETATION OF TEST-—RETEST SCORES 485 


(c) If a point falls above line (4), there is a probability that the stu- 

it has gained more during the year than the average student who had the 

- initial ability; if the point is above line (5), it is probable that he has 
ade greater growth than is made during a year by five out of six students 
th the same initial ability. 

(d) If a point falls below (4), there is a probability that the student 
; gained less during the year than the average student of the same initial 
ility; if the point falls below (6), it is probable that he has made less 
rowth (or greater decline) than is made Sass a year by five out of six 

lents with the same initial ability. 

In shorter form, we may say that a point above line (4) indicates 
reater than normal growth, and that growth to a point above line (5) is 
unusual”. It must be noted that the position of any point is a probable, 
ither than an exact, description of a pupil's performance, as the scores 
tained on both the first and second testings are somewhat unreliable. 


According to the chart, one would conclude that pupils C and D have made 


ut as much growth as the average student who was equal to them int 
lly, while the growth of pupil A appears to be greater than that of other 
tudents with comparable initial scores. While the actual change of score 
pupil B is small, judged by the position of line (1), when line (4) is 
onsidered this failure to gain appears more serious, as the average pupil 
jual to B at the start of the year had a score of 3.9 at the end of the year, 
mpared to B's score of 3.1. 
A class of about twenty students may be efficiently plotted on one 
art; for larger classes, two (or more) sheets are used to reduce crowding 
n the chart. After the plotting is finished, the teacher may inspect the class 


heet and determine instantly the code numbers of those pupils who have 


ade unusual improvement, or who have made unusual lack of improve- 


ent. The interpretation for any individual is made readily, as above. One 


further valuable outcome is that the picture of the growth of the group as a 


hole may sometimes give meaningful insights into the teaching process. 
ne instance of this was the case of a group where several students made 


inusual gains in a certain test score, while there were others making much 


ss growth than would normally be expected. Inspection of the chart showed 


iat the points above line (5) were clustered at the right of the chart, while 


the points below line (6) were clustered at the left of the chart. This im- 
plied that the year’s work had been more effective than that of the normal 
lassroom in aiding the best students to make improvement, but that those 





JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, N 


who were poor at the outset made very little progress. Such a finding can 
suggest important changes in the teaching procedure. Data of this type have 
been used in the biology department at the State College of Iowa to deter 
mine which teachers were most effective with high-ability groups and which 
were most effective with low-ability groups; the evidence so obtained has 
led to revised teacher assignments so that each instructor works with the 
group with which he has greatest success. 

It will be noted that no attempt has been made to compare the progress 
of one student with that of another student in the same group having a 
different initial score. If this is required, a procedure may be developed 


using a modification of the statistical techniques below. 


CONSTRUCTION OF CHARTS 


lo develop such a chart as proposed here for any test or pair of tests, 
the first step is a ‘comparability study’. If parallel forms are being used, 
a sample of students which will be accepted as representative and adequate 
is chosen and tested on one form, say Form A. The next day, the same stu 
dents are tested with Form B. A scatter diagram is then made of the two 
sets of scores, using convenient class intervals along the A-axis. For each 
B-array (all cases whose A-scores lie within a given class-interval), the mean 
ind standard deviation are computed. Through the mean points so obtained, 
a smooth curve (line (1)) is passed, using either inspection or a curve 
fitting technique. In each array, points are marked off one standard error 
of estimate above and below the line (using the value of the standard error 
obtained for that array). Through the points so obtained, smooth lines ( (2) 
and (3)) are passed. These lines indicate what changes in score may be 
expected when errors of measurement, regression, practice effect, and lack 
of comparability of the two forms are taken into account; that is, how much 
a score would vary in successive testings under conditions which allowed no 


chance for change in the abilities being tested. If the same test-form is t 


be used to measure initial and final ability, a retest study like the compara 
bility study should be made. If this is not practicable, a split-half reliability 
coefficient may be obtained and a regression line determined. Lines (2) and 
(3) may be added, using the computed standard error of estimate. This is 

ipproximate procedure only, as the standard error of estimate may not b 


constant for all positions along the scale, and the necessary assumption that 


the regression is linear may be false. 


ee ab ies US 


elite: 


ee ee ae 














ng can 


ss have 
| deter 
which 
ed has 
th the 


‘ogress 
ying a 
sloped 


tests, 
used, 
quate 
e stu 
> two 
each 
mean 
ined, 
curve 
error 
error 
((2) 
y be 
lack 
nuch 
1 no 





ae 


nw 


wes el 


NE eh lille 9: 








A ty 





143} INTERPRETATION OF TEST-RETEST SCORES 487 


To obtain useful lines describing normal growth, a large and repre- 
tative sample must be selected, according to the principles which apply 
the derivation of the usual type of test norm. It has been found necessary 
prepare separate charts for each grade, as, for example, normal growth 
n tenth to eleventh grade frequently does not follow the same line as 
rmal growth from eleventh to twelfth grade. If, in a particular case, the 
arts separately prepared are found very similar, it is simple to combine 
em. J. Murray Lee has suggested® that these charts would have greater 
efulness for interpreting achievement tests if the chart for each grade were 
subdivided to show goals appropriate for each level of intelligence. The 
nplest approach to this would probably be preparation of three charts for 

h grade, one based on a sample of high, one of average, and one of low 
ntelligence. 

Having chosen a sample of students for each grade, it is necessary to 
est them at the beginning of the school year and again at the end, using 
varallel forms, or the same test if necessary. The data from these testings 
re made into a scatter-diagram, and the results are treated by the same 
tatistical procedures as were used in the comparability study. The resulting 

s are the required lines (4), (5), and (6). 

It will be noted that in each case, lines were established one standard 

or of estimate above and below the means of the arrays. This is an arbi- 
ary procedure, found convenient in practice. Assuming that each array 


I 
ntains a normal distribution, it follows that approximately one-sixth of 


the cases (more precisely, 15.9 per cent) will fall above the upper line, and 


he same number below the lower one. In any case where inspection shows 


that distributions within arrays are very skewed, it is probably advisable to 


percentiles within the arrays to determine lines (2) and (3), or (5) 

1 (6). 
While the procedures outlined here may appear elaborate, they yield 
nsiderably more information than the conventional type of test norm. 
he dangers inherent in any type of norm are present in this procedure also; 
is of course unwarranted to assume that the progress normally made in 


the school or schools selected as a sample is to be desired in any other school, 


without first determining whether or not the objectives of the various schools 
ire the same. It should be noted that the writers have used charts of the 


*In conversation with one of the writers 





188 JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, N 


kind described above in working with classroom teachers, and have found 
that these teachers quickly learned to make correct use of the charts. 


Studies required to prepare growth charts present a task which is often 
beyond the facilities of a classroom teacher working alone. In a school 
which has a continuing testing-guidance program and some one person with 
basic statistical training, such studies appear feasible. What is even more 
important is that test authors and publishers consider the adoption of this 
proposed type of standardization, to make it possible for the teacher to 
evaluate the student's progress in terms of a standard appropriate for him 
rather than in terms of a standard based on the average student. 


1.5 2.0 2.5 3.0 3.5 4.0 %.5 5.0 5.5 6.0 5.5 7.0 7.5 


SCORE OW 1.41 ya 
ow 
—_— 2? 
.0 a 
i hk a” 
ee ne . ~ 
5.5 a A Pd 
, ww ll —_ 
5.0 Sa 4 an. “ 
a 4 Ts é 

4 55 (Sy ae" 3 . P 

s ,* ee YZ _—_ bes 

»* 5 a i on” " 

4,0 ot (Vy P ae 

é Qa so 
3.5 6) a ; ww 

m _# a 

& 4 ~ Ya a 
3 2) ¥ " -* 
~ F © 
2.5 Q)-A7 
< Le 4 « 
1.5 L 
1.0 

2 3 20 »5§ 5 1 6 2? | .  * 





Frequency in standard eample (per cont) (a - 130) 


FIGURE 1—PLOTTING SHEET FOR DETERMINING PUPIL AND CLASS GROWTH IN 
RATIO", Test 1.41—1.42, FRoM GRADE 10 TO GRADE 11 








found 


S$ Often 
schoo] 
nN with 
| more 
of this 
her to 
i him 








A TEST ON THE SCIENTIFIC METHOD 


ARTHUR JOHN TER Keurst, Pu. D. 
Western Illinois State Teachers College 
and 
ROBERT E. BuGBeE, Pu. D. 

Fort Hays Kansas State College 


Editor's note: One of the goals of education is objectivity and scientific 
indedness. The authors offer a new test in this important area, 

SINCE Western civilization attained its power with the aid of the 
ntific method, as developed by such leaders as Descartes and Bacon, it 
necessary that we maintain as careful an understanding as possible of the 
ethod that underlies such a large part of our contemporary civilization. 
one of the contributions of recent years, standardized tests were devel- 

ped by which one can check his ability in a variety of fields. In spite of 
many standardized tests that have been constructed in so many areas of 
ject matter, it is noteworthy that very few tests have been devised to 
mine the understanding of the scientific method that is so important to 
lives. A pertinent reason exists why such a test would be valuable to 
avion, 

A valid test on the scientific method would indicate whether or not a 
lent has made correct generalizations from his experiences with natural 
nomena. Only a reference need be made at this time that one makes 
neralizations from many specific instances, and in turn applies these gen- 
lizations to other specific instances of the same general nature. Obvi- 

sly, if a student has not made correct generalizations covering many phases 
natural phenomena or lacks the correct procedure by which generaliza- 


ions are made, it may be questioned whether or not he would be successful 
n the field of sciences. On the other hand, however, if a person has arrived 


a correct procedure for understanding natural phenomena, it is apparent 
at such an acquisition is vastly more important than the mere accumulation 
isolated facts in the sciences. 
It is the purpose of the test described in this report to provide means 
which teachers or students can check themselves on the understanding 


f the methodology of science. 


The items for this test were obtained from the faulty use of the scien- 


ic method as revealed by the errors of students in classes in psychology 


489 





} JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, N 


and biology. (A procedure was determined to be erroneous whenever it 
failed to agree with the scientific method as explained by logicians.) The 
errors in the use of the scientific method, thirty in number, became the 
basis for the construction of an informal test reported in the 1939 issue of 
the Transactions of the Kansas Academy of Science. This informal test 
attained a coefficient of correlation of +-.5058 +.06 with the scores on the 
Psychological Examination of the American Council on Education for 
seventy-one freshmen, and +.6566 +.06 for fifty-eight sophomores. The 
coefficients of correlation between the scores on this test and the Nelson 
Denny Reading Test were calculated to be +-.5669 +.05 and +-.5930 +.05 
for sixty-two freshmen and fifty-eight sophomores, respectively. 


By collecting additional errors in psychology and biology, the informal 
test was lengthened to seventy items. These seventy items, including both 
the test situations and the possible answers were revised very carefully and 
sent for validation to one hundred specialists in science, including biologists, 
psychologists, educationists, zoologists, mathematicians, physicists, and geol 


ogists. We were pleased to have a 46 percent return classified as follows 


Zoology 19 
Education . 
Psychology 
Biology 
Physics 
Entomology 
Geology 
Chemistiy 
Botany 


4 


“ee hh 


Obviously a disagreement was found on account of various interpreta 
tions of the wording of the possible answers. All the items that failed t 
receive at least a two-thirds agreement among the specialists were eliminated 
leaving fifty items for the test. Minor corrections that incorporated the sug 
gestions of the specialists but did not affect the validity of the items were 
made. Consequently the validity of the test, as far as the correct interpreta 


tions of science are concerned, was established by the opinions of experts 


This test of fifty items was administered to 405 college students, con 
sisting of 309 freshmen, thirty-seven sophomores, fifteen juniors, and forty 
four seniors. These college students attended Kansas State Teachers College, 





ABE: 


O96 


sti 


} 
- 
; 
4 








Never jt 
$.) The 
ume the 
issue of 
nal test 
on the 
On for 


forma! 
7 both 
ly and 
OgIsts, 

geol 


Ws 


On 





Mine 


TEST ON SCIENTIFIC METHOD 491 


ria, Kansas, Fort Hays Kansas State Teachers College, Hays, Kansas, 
the College of Emporia, Emporia, Kansas. The scores for each student 
ach question were calculated. Then the test was rearranged in an increas- 
der of difficulty into its final form. 
Although the validity of the test items as being descriptive of science 
established by means of the opinions of specialists in science, the prob- 
arises will this test as a whoie differentiate between good and poor 


ents in the field of the sciences. As an answer to this problem, this test 


> ats 


Iministered to eighty-eight additional college students at the Western 
is State Teachers College, Macomb, Illinois, This group of eight-eight 
nts consisted of forty-seven and forty-one good and poor students in 


sciences, respectively. The basis for the selection was the rating of ten 
ructors in Advanced Inorganic Chemistry, Analytic Geometry, Botany, 


itional Psychology, General Agriculture, General Chemistry, Mechanics, 
Physical Geology. Each instructor was asked to list five of the best 


lents and five of the poorest students in his respective class. Difficulties, 


as students dropping out of school, prevented the administration of 
st to the entire group of fifty poor students. The classification of the 


s is presented in Table I. 


It may be noted that the number and percentages of the students in 


freshmen and sophomore classes are fairly equal. Although proportion- 


more junior students are found in the good group than in the poor 


up, this slight unbalance is felt to be offset by the larger number of 
ors in the poor group than in the good group. It is felt that in general 


tudent classification of the students is approximately equal. 


TABLE I 


THE CLASSIFICATION OF THE GOOD AND PooOR STUDENTS IN SCIENCE 





Good Group Poor Group 
Number Per cent Number Per cent 
en 16 | 34. 03 $1. 72 
more 18 38. 29 16 | 39. 04 
5 17. 01 | 9.76 
5 10. 63 | 8 } 19.52 
Total — 47 100. 00 41 100. 00 








492 JOURNAL OF EDUCATIONAL RESEARCH {[Vol. 36, N 


TABLE II 


DIFFERENCES BETWEEN GOOD AND Poor STUDENTS IN SCIENCE IN TERMS OF Carr 
ICAL RATIOS BETWEEN SCORES ON THE TEST ON THE SCIENTIFIC METHOD, 
PERCENTILE RANK ON PSYCHOLOGICAL EXAMINATION AND 
GRADE-POINT AVERAGES 














Good Poor Critical 
Students Students Ratio 
Test on the Scientific Method 
Number of Cases......... a 7 -s 47 41 =— 
a. ae : . - 37. 82 $1. 00 5. 01 
Standard Deviation.... me io 5. 78 Sa -Rewaeee - 
Psychological Examination ; 
Number of Cases... _... wasllitiglh om 42 38 —s % 
Mean... . eee aie : salah desta abate 69. 75 35. 90 5. 24 4 
Standard Deviation... aaa : aie 22. 50 33. 55 —— 
Grade-Point Averages 
Number of Cases ee ee 4 iianiiticiastinlmadia 7 41 a 
Mean.. : —_ bse ‘ inten akialh 2.11 . 80 12. 49 
Standard Deviation... : iit - ae Wr - 42 -54 nenbec 











As presented in Table II, by the use of the standard error of difference Ey 


formula' the critical ratio between the scores on the good and poor students 
is 5.01. According to Garrett: 


Difference 
‘SD. diff. ° 
reliability, since + 3 S. D. includes practically all of the cases in the “dis 


— — iff ' 
tribution of differences” below the mean. need greater than 3 is to 
bb ° iit. " 


be taken as indicating so much added reliability. 


It is customary to take a 3 as indicative of complet 


In comparison with the critical ratio on the test being presented, the 
critical ratios for the scores on the Psychological Examination of the Amer 
ican Council on Education and for the grade-point averages were also cal- 
culated. The critical ratio for the Psychological Examination was slightly 
higher but not to a significant degree. The grade-point average was com 
puted by weighting the letter grades earned by the eighty-eight students 
during the Fall Quarter of 1940. A grade of A received three honor points, 
B received two, C received one, grades of D and F received no honor points 
All but nine of the forty-one poor students and three of the good students 
were pursuing four four-hour courses. All of the exceptions were studying 


*H. E. Garrett, Statistics in Psychology and Education (New York: Longmans, 
Green and Co., 1926) p. 133. 














1943] TEST ON SCIENTIFIC METHOD 493 


hree four-hour courses. The critical ratio for the grade-point average was 

oo <aae 12.49, which was considerably higher than that for the other two criteria. 
' With respect to the question will this test differentiate between good 

poor students in science, it may be concluded that it does differentiate, 

-_— cording to Garrett, with complete reliability. A reason exists why this 
Ratio test and also the Psychological Examination of the American Council on 
| Education do not attain the high critical ratio as that presented by the grade- 
t average. Many other factors, such as neatness, compliance with the 
mands of the instructor, manual dexterity, class attendance, industry, per- 
verance, besides many others that could be named, contribute in addition 


5. 01 


5. 24 ) knowledge of method or general intelligence to the final grades for a 
; n student. The authors found several students who performed very well 
aa this test for the scientific method but received very low grade-point 
--- iges. On the other hand, all the students rated to be excellent students 
—- their instructors did not exhibit as superior ability in the knowledge of 
ference [an e scientific method as their grades would indicate. Similar observations 
udents were noted with respect to the grades earned by some students and their 
formance on the Psychological Examination of the American Council on 

ation. 
mplet Since this test has considerable validity, as indicated by its ability to 
» “die lifferentiate with complete reliability, it is also necessary to determine the 
i ibility of the instrument. The coefficient of reliability of the test as cal- 
‘* ted by the split-half even versus odd technique is +-.6956 +.01. If the 
Spearman—Brown Prophecy Formula is used, the coefficient of reliability is 
d, the 8205. As another measure of reliability of the test, the probable error 
Amer f the score was also calculated. If the coefficient of correlation between 
coal: split-halves is used, the probable error of the score was found to be 
ightly + 2.53. If the results of the Spearman-Brown Formula are used, however, 
com he probable error of the score is +1.99. Such a mathematical result would 
dents licate that fifty chances in one hundred, the score of the individual student 
oints, | not vary in either direction more than 2.53 points. Since the probable 
oints rror of the score is rather low, we feel that the instrument has high 

dents ability. 
dying The difficulty in securing as many upper classmen as freshmen is 
nitted to be a specific weakness of our test. The norms and their standard 
mans, viations for the respective classes of the 405 college students found in 

Kansas schools are presented in Table III. 














194 JOURNAL OF EDUCATIONAL RESEARCH {[Vol. 36, N 
TABLE III 
THE CLASS AVERAGE SCORES OF 405 COLLEGE STUDENTS ON THE TEST 
ON THE SCIENTIFIC METHOD 
fe 
| Number of 
Class Cases Mean Ss. D. 
| a 
Freshmen | = 309 32. 90 6. 34 
Sophomores ; 37 35. 44 6. 06 
Juniors : 15 40. 64 10. 08 
Seniors | 44 41. 20 5. 64 
Total... a 405 


The percentile of the number correct for the freshmen class is also 
presented. The relatively small number of upper classmen does not warrant 
the presentation of a percentile rank for each class. The score in this test j 


indicated to be the number of exercises answered correctly. 


Percentile 


[he test is hereby presented: 


TEST ON THE SCIENTIFIC METHOD 


NAMI 
STUDENT CLASSIFICATION 
DIRECTIONS: Place the number of the BEST answer in the space 
number of the exercise 
1. The primary aim of science is to 
(1) Refute religious or philosophical dogma 
(2) Discount old or archaic ideas 
(3) Substantiate the results of others 
(4) Seek the truth by means of analyzed observation 
2. The scientific method allows final interpretation to be based 
(1) Current opinion 
(2) Speculation 
(3) Observed and analyzed data 
(4) Mores and traditions 


48.00 

. 41.10 
.- 38.87 

. 37.14 
35.12 
33.00 
31.39 
30.04 
27.88 
24.14 
13.00 


in front of 


only on 


the 


— 








tt a TeDe 


“Soph 6 lle: eens: es 


a, oct ea tthe “es 


‘ 


ar 


: 
\ 





sT 


| MOMS 


is also 
warrant 


test 1 


Ts é. 


Nel esata 





=) 





3] TEST ON SCIENTIFIC METHOD 495 


i+ 


According to the scientific method, data should be interpreted by 
(1) Everybody. 
(2) The majority. 


(3) Experts. 
(4) The authorized. 


The scientist should use as a means for making his decisions his own 


(1) Emotions. 
(2) Intelligence. 
(3) Habits. 

(4) Instincts. 


With the respect to the interpretation of data, the scientific method allows 

interpretation to be made only as far as a 
(1) It fulfills popular opinion 

(2) The data justify it 

(3) The data do not justify it. 

(4) It supports tradition 


In the study of a scientific problem, it is necessary to point out as causal those 
factors which are 

(1) Nearest in point of time 

(2) Most readily explainable 

(3) Contributory to the results 

(4) Non-essential. 


To be scientific, one should 


(2) Adopt the questioning attitude 

(2) Accept without verification the statement of others 

(3) Accept the statements that are claimed by others to be scientific. 
(4) Accept the statements of those with whom one agrees. 


According to the scientific method, the explanation of a known phenomena 
should be expressed in terms of 


(1) Such forces as life urge, spirit, soul, etc 
(2) Known forces composing the phenomenon 
(3) Unknown factors in the phenomenon 

(4) Factors irrelevant to the phenomenon 


The scientific method must include for the interpretation of the final solution 
of a problem the consideration of 

(1) Only a few factors in order that the solution might become simple 
(2) The unknown factors 

(3) All the known factors 

(4) All observable, and essential factors 


The scientific worker should 


(1) Discard all previous advancements of knowledge 

(2) Accept purported authority as a tentative conclusion 

(3) Accept authority that is supported by usage of long standing 

(4) Accept authority that is supported by patriotic or religious feeling 





11 





JOURNAL OF EDUCATIONAL RESEARCH [Vol. 3¢ 


To be scientific, a person must derive his conclusions through 

(1) His own experience alone. 

(2) His own experiences and those of other normal individuals. 

(3) The experiences of normal individuals under abnormal conditions 
(4) Experiences derived by inner revelation 


When one follows the scientific method, he should 

(1) Discredit all authority. 

(2) Accept authority when the majority agree with it. 

(3) Accept the statements of those whom he considers to be authorities 
until opportunity for verification is offered 

(4) Accept the statements of those who claim that they are authorities 


That a valid conclusion may be obtained, the scientific method teaches t 
every problem should be studied within 

(1) A possible future setting 

(2) A previous setting 

(3) An abnormal setting 

(4) Its own natural setting 

The worker in science must treat newly-discovered conclusions of ot! 
workers with 

(1) Contempt 

(2) Dogmatic acceptance 

(3) Tentative acceptance 

(4) Indifference 


According to the scientific method, in the study of a present situation 
person mav accept an authoritative statement when 

(1) The conditions underlying the authoritative statement and those of t 

present situation are alike 
(2) Popular demand sanctions it 
(3) The conditicns underlying the authoritative statement appear to parall 
the present situation in a few essentials 
(4) Authority appears to be the easiest explanation 


The method of pure science holds the scientist to a code of ethics that he n 
(1) Think only of means by which human welfare can be best promoted 
(2) Not conflict with authority 

(3) Think only of attaining the truth irrespective of anything else 

(4) Not disrupt the social order 


To be scientifically accurate an investigator should seek in the final solut 
of a problem the weight or importance of 

(1) Only one unknown factor 

(2) All the pertinent factors 

(3) Only the constant factors 

(4) Only the variable factors 

is substantiated wher 


According to the scientific method, a supposition 
solution 

(1) Is verified by observable data 

(2) Satishes the previous opinions of the investigator 
(3) Appears to be logical 


(4) Agrees with custom 








19 
0 
Orit 
s 
's 
‘ 
ot ’ 
ion 
f 4 
24 
Mm 
! 
S 


1943] TEST ON SCIENTIFIC METHOD 497 


According to the scientific method, a person may use for a solution of a 
second problem the procedure employed in solving the first problem when 
(1) A few of the elements are comparable in both problems. 

(2) When the non-essential elements are the same in both problems. 

(3) When the essential elements of both problems are the same. 

(4) Cause and certain effects are concomitant. 


Scientific conclusions must be treated as 

(1) Absolute and changeless truths. 

(2) Explanations subject to possible revisions 
(3) Authoritative pronouncements. 

(4) Explanations of ultimate reality. 


A given factor is considered to be non-essential in the solution of a problem 
when 
(1) No correlation is found between the operation of the given factor and ” 
the obtained results under variable conditions. 
(2) A high correlation is found between the operation of the given factor 
and the obtained results under variable conditions. 
(3) A high correlation is found between the operation of the given factor 
and the obtained results under constant conditions. 
(4) No correlation is found between the operation of the given factor and 
the obtained results under certain constant conditions 


According to scientific terminology, a hypothesis is a 

(1) Law that has a number of exceptions which prove its validity. 
(2) Disproved law. 

(3) Newly discovered law 

(4) Preliminary generalization still waiting for proof 


To be scientific, a worker should consider the average or mean of a group 
(1) Without including the scatter or deviation about the mean. 

(2) By including the scatter or deviation about the mean. 

(3) By including the scatter or deviation only above the mean. 

(4) By including the scatter or deviation oaly below the mean. 


The chief teaching purpose of a laboratory, conducted according to the 
scientific method, should be to 

(1) Teach students the use of scientific instruments. 

(2) Teach students only scientific accuracy and precision 

(3) Teach by object lessons what had been stated in textbooks 

(4) Teach students methods of analyzing problems. 

Scientific measurement demands that the method of measurement be deter- 
mined by the nature of the phenomena or materials 

(1) Being measured 

(2) Comparable to the ones being measured. 

(3) Dissimilar to the ones being measured 

(4) Opposite from the ones being measured. 

The ultimate aim of the scientific method is to 


(1) Express phenomena in terms of verified natural laws 
(2 


) 

>) Overthrow conclusions that are well-established in the minds of the 
people. 

(3) Verify what has already been discovered. 

(4) Discover incidental facts 





29 





JOURNAL OF EDUCATIONAL RESEARCH {[Vol. 36, N 


To be scientific, a person should use mathematical formulas only when the 
formula 

(1) Partially covers the conditions. 

(2) Symbolizes adequately the conditions it purports to cover. 

(3) Is said to symbolize adequately the conditions it purports to cover 

(4) Is the easiest and simplest method of solving the problem. 


In relation to the control group, the experimental group represents the normal 

group with 

(1) None of the factors changed. 

(2). All of the factors changed. 

(3) The exception of a certain changed factor, or set of related factors, 
which is being studied. 

(4) Many changed factors chosen indiscriminately. 


According to the scientific method, responsibility for an answer to proble 
that confront an individual rests with 

(1) Society. 

(2) The individual 

(3) Tradition. ; 
(4) Authority. 

In the solution of an original problem, the scientist must necessarily adopt 
those procedures which are 

(1) Trial and error 

(2) Authoritative. 

(3) Intuitive 

(4) Mystically inspired 


When a principle has had wide application and is found to be invalid in 

certain phases, scientists apply the principle 

(1) When authority or common usage warrants the use of the entir 
principle. 

(2) In all situations the principle purports to cover. 

(3) In no situations whatever. 

(4) In situations covered by the valid phases of the principle. 

The control group in a scientific experiment represents the norm under 

(2) Any set of uncontrolled conditions 

(2) A set of variable conditions. 

(3) A set of constant and widely variable conditions 

(4) A set of known constant and variable conditions 


The scientific method is applicable in 


(1) Only the physical science 

(2) Only the physical sciences and social studies. 

(3) All fields. 

(4) All fields of knowledge based upon observable data. 


In scientific terminology a law is a description of phenomena 

(1) That, according to our present knowledge, operate invariably undet 
given conditions 

(2) That operate in the presence of many and varied exceptions in order 
that the rule may be proved. 

(3) Covering a bare majority of cases. 

(4) That operate under variable conditions 








hen the 
ef 

n 
factors 
»D] 

lid 

ef 


{3} TEST ON SCIENTIFIC METHOD 199 


According to the scientific method, valid conclusions are allowed only after 


(1) All cases have been studied carefully 

(2) The investigator has made a study of one hundred cases 

(3) The investigator's results are in agreement with a majority of other 
investigators of analogous problems 

(4) Enough cases have been studied that further research does not add mate 
rially to what is already known. 


With respect to the emotional, moral, or aesthetic aspects of a scientific 
sroblem, the scientific method and procedure demand that the worker in 
pure science should 


1) Make his results conform to these aspects 
2) Be influenced by these aspects as long as human welfare is promoted 
3) Let his results be colored by these aspects 


{) Not be influenced by these aspects 


As an aid to the scientific method, the chief value of mathematics is 


(1) Training in general accuracy 





(2) Training the mind in forms of logi 
(3) Symbolic means for the expression of quantitative and qualitative rela- 
tionships. 
(4) Means for developing the intelligence of the worker 
According to scientific terminology, data are objective when the sensory 
periences are 
(1) Comparable for all normal investigator 


(2) Consistent for one individ 
(3) Unlike for one individual 
(4) Unlike for various investigators 


} } 
ual under a variety of conditions 


To be scientific, a person may experience through the activity of others only 
when the essentials in both experiences 

(1) Resemble each other 

(2) Are opposite. 

(3) Are alike. 

(4) Are unlike. 

According to the scientific method, if a student upon repeated and accurate 


checkings in basic facts, method, techr 


to a problem to be different from tl 


and calculations finds his answer 





he book, he should 


(1) Conclude that the book might be in error 

(2) Accept the answer in the book without further investigation 

(3) Change his procedure so that the answer in the book may be substan- 
tiated by his results 

(4) Forget about the problem. 


According to the scientific method, when the same cause ts f und to <« perate 
in identical mechanistic situations, the results will be 


(1) Similar. 
(2) The same 
(3) Opposite from each other 


(4) Unlike. 





JOURNAL OF EDUCATIONAL RESEARCH {Vol 


According to the scientific method, scientific knowledge is gathered by 
(1) The direct and indirect use of the sense organs 

(2) Intuition or huncl 

(3) Revelation 

(4) Authority 


According to the scientific method, when the same cause is found to 
in two or more Organic situations, the results will always be 

(1) The same 

(2) Opposite 


(3) Known exactly and therefore predictabl 

(4) Subject to variati 

I ihc met 1 teaches that all questions which come within its 
(1) Be answered wit xtr ea i 


(2) Be answered dogmatically and with assurance, even if temporan 


I natior i Ice ] 
(3) Ne be isW i 

Be left unanswered tif ewere are impossible a? that tin 
(4) se icIit un wered, if scientihc iswers are impossiDie at that tu 


With respect to the advancements in scientific knowledge, scientists 





(1) Means for interpreting knowledge gained directly or indirectly thi 


(2) Technique f untaiming accuracy 

(3) Procedure tf ing dat 

(4) ¢ prehe vpoint of life 

After a pe ’ gained inence in a particular field, acceptable s« 

ethod 

(1) Ex speak thority in fields allied to the 
W ade careful ntific study 





(2) Allows him to speak with authority in fields not closely related 
ones in which he has gained eminence 


, , , = 
(35) Requires him ¢t retrain from speaking authoritatively in all f 
except the ones in which he has gained emin 
i i 
(4) Demands that ! joes not speak about material in any field 
I ung in the i C 1 sl ld giv ’ knowledge of 
(1) Fields based « oth s ? i p phy 





of LY. A se 











red t y 











- 


2s 
Sap ee 


na dias Sacoe, 


Sint ts 


oe 


© hal ea 


\ 


TEST ON SCIENTIFIC METHOD 


According to the scientific method, an analogy can be used in science only 
when the functioning characteristics are 

(1) Opposite 

(2) Comparable 

(3) Alike. 

(4) Unlike 

According to scientific terminology, observable data becomes subjective when 
(1) They cannot be interpreted 

(2) They are used as a basis for exact statements 

(3) Different investigators have the same sensory experiences 

(4) They become subjected to interpretation 


ce this test on the scientific method presents to the student in the 
f an examination some of the more basic principles of the method 
e, the authors believe that such an instrument will be an aid in the 
not only of a worker in science but also our lay population. When 
iders the incorrect use of the methods of science, not only among 
nal workers in that field but also our lay population, it seems im 


that the emphasis in our schools should be on the principles of 


imct 


stead of the mere amassing of scientific data. 





THE CONSTRUCTION AND EVALUATION OF A SCALE TO 
MEASURE ATTITUDE TOWARD ANY 
EDUCATIONAL PROGRAM 


RICHARD M. BATEMAN 


Peru, Indiana High School 
In a democracy the attitude of the people in a scho 
lity is always important. The author offers a new device in this area 


THERE is a constant search for techniques that might be utilized in 
evaluating the various types of educational activity to be found in the 
odern institutions of learning whether on elementary, secondary, or colleg 
level. In spite of the extensive progress made in the field of tests an 


measurements during the past quarter-century there is still a lack of v 


nd reliable measures of psychological variables that are highly important 
pecially in the general field of social-psychological phenomena. 

This lack of evaluative criteria is especially noticeable in the field of 
ertain guidance activities. There is also a general lack of measuring device 
wailable for evaluating the results obtained from the various types of scho 
rograms designed by educators to develop certain social-psycholog 

The author noting a lack of suitable scales to measure attitude tow 
the various types of educational programs that are prevalent in most scho 
ome room, auditorium, group guidance, individual guidance, etc.— 


leveloped a generalized scale to measure attitude toward any educational 


A technique developed by H. H. Remmers' of Purdue University wa 
ized the author in developing this measure of social-psychologi 
henomena or “attitudes”. This technique, a modification of the meth: 
1 by Thurstone,? is described functionally in the explanation of the « 
struction of the scales to measure any educational program. 


CONSTRUCTION OF THE SCALES 


In order to make this a general attitude scale the statements have be 
worded that they will apply to any program. A program is defined for 


R rs, H. H. Generalized Attitude Scales-Studies in Social-Psychological 
Measuremer Studi in Higher Education, 26, Bulletin XXXV, No. 4, (Lafayette 
Indiana: Purdue University, Dec. 1934), pp. 7-17 

rhurstone, L. L. and Chave, E. J. Measurement of Attitude Toward the Chur 
(( f Ill.: University of Chicago Press, 1929) 


$02 














LE TO 


schoc 
iS area 


utilized jr 


id in 
or col! 


: of va 


mport nt 


P field ol 


ig device 
of scho 
hologi a 


e tow 
t schor 


etc.— 


ucationa! 


sity wa 
10logi 
method 
the « 


ve be 
ned for 


bolo gical 


afayette 


Chur 





tests and 








om tao 5, 


a el 


aed 


943] ATTITUDE TOWARD EDUCATIONAL PROGRAM 503 


nt purposes as the selections or features of an educational presentation 
ntertainment of any type or order. The term attitude includes the sum 
of an individual’s beliefs, feelings, prejudices, notions, ideas, and fears 
t any procedure. 

Attitude scales previously developed by Remmers* and others were 
ilted for statements which were adapted to the author's subject. State- 
ts selected for the initial process of elimination were selective and highly 
rior statements for this type of measuring device since these statements 
already gone through a sorting process in the perfection of other 
ide scales. 

The next procedure was to revise and edit this list of statements. Five 
vritten copies of about 200 statements were made, and these were 
ked over by instructors and students of Purdue University and Peru 

school. The directions for checking were as follows: (1) Rate each 
nent on a three point basis: (a) excellent, (b) fair, (c) worthless. 
) Base rating on whether statements can be applied to all educaitonal 

ms. (3) Suggest improvement of statements by giving number and 

vement on separate sheet. 

By choosing the statements on which the judges agree as best filling 

juirements, a list of 120 statements were retained for experimental 

it 

[he 120 statements were scaled by a representative group of 200 Peru 


freshman, sophomore, 


hool proportionate 


students, representing 
r, and senior groups—equally divided as to boys and girls. The stu- 
were selected at random in order to obtain a ‘‘chance” representative 
Following the Remmer's technique these groups sorted the statements 
11 piles, A to K, representing extreme favorableness at A through 
tral opinions at F to extreme unfavorableness toward any educational 
ram X at K. 
The results of these sortings were tabulated and from the tabulations 
lian and Q values were determined for each of the 120 statements. 


xty statements were finally selected to make two equivalent forms, A and 


f the scale. Statements with high Q value were not used. The two forms 


ether with their scale value and Q value for each item follow: 


H., and others: Op. cit 


*Remmers, H 





504 


Scale 

Value 

10.1 
9.9 


A 
= ee) 


ro 


~~ te re bh Re OP We we BD a 
ion De NU we 


— > Dd 


Scale 
Value 
10.1 

9.9 
9 7 
9 5s 


, 


_ 
2) 


9 
4 
2 
9 
75 


2 0 3 


*> a) 
» 


A 
) 


a 
~ XZ 


Value 


1 


I 


Vi 


1 


l 
l 
l 
] 


Q 


93 
.80 
81 
.07 
.26 
14 

99 
14 
10 

03 
87 
Bp 
.99 
48 

18 
97 

19 

07 

16 

18 

11 

41 

05 

06 
9? 
69 
66 
65 

43 


Q 
alue 
92 
80 
89 
99 
78 
02 
89 
14 
15 
12 
02 
06 
90 


I 
3 
4 
s 
¢ 


IN 
_ Oo 


IAW &BwN 


XY NNMNMNNNNN?T 


— 
~ 


= =) 


JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, No.? 


ForM A 


This program was better than any other 

This program was outstanding. 

I really enjoyed this program. 

This program will mean a great deal to me when I am old 
This program gave me a great deal of pleasure. 
This program was valuable to me 

I could enjoy more programs of this type 

This program should be liked by everyone 

This program contributes to better living 

This pregram was interesting 

This program has more merit than demerit 


. This program is liked only fairly well. 


I enjoyed only parts of this program. 

This program could be much more interesting 

This program will benefit only the brighter pupils 

I am careless in my attitude toward this program 

This program has its drawbacks. 

I like many other programs better than this one 

Quite a number of things about this program annoy me. 
This program has limitations and defects. 

This program has more disadvantages than advantages. 
Few students will gain anything from this program 

The minds of students are not kept active by this program 
This program has very little educational value. 

I have a feeling of hatred for this program. 

This program does more harm than good 

I wouldn't listen to this program unless I had to 

This program would be enjoyed only by stupid people 
This program accomplished nothing for the individual or gro 
This program is unfit for anyone 


Form B 


This program was extremely worthwhile 

This program has an irresistible attraction to m«¢ 

This program makes one think 

This program is a benefit to everyone 

This program was very good 

This program is fundamental for good social life 

Every student should gain something from this program 
This program should make for better citizens 

This program was worthwhile 

Most students should gain something from this program. 
This program was not boring 

My likes and dislikes for this program are balanced 

I would enjoy this program if it were changed somewhat. 
This program isn’t bad but it isn’t good either 

This program would be all right if it weren't for a few dis 
agreeable features. 

Some people would like this program but more would not 
I have enjoyed other programs more than this one 











1 old 








od 0 G8) wg. 


- 


ie 


ee a a 





ATTITUDE TOWARD EDUCATIONAL PROGRAM 505 


943} 
7/7 Jj 


1.52 18. I don't believe this program will do anyone any good. 

91 19. This program has several disagreeable features. 

1.36 20. The minds of students are not kept active in this program. 
38. 21. This program is foolish. 


l 
1.04 22. I don't care about this program. 
1.03 23. This program benefits too few people. 
1.30 24. This program is frowned upon by intelligent people. 
1.20 25. I am not interested in this program 
1.19 26. This program is not endorsed by sane people. 
94 27. This program should not be repeated for other groups 
66 28. This program was very boring. 
80 29. This program is of no use to anyone. 
32. «30. I hate this program 


A tendency to bunch at the two extremes of the continuum is to be 
| in both forms of the scale shown. This is typical of practically all of 
ittitude scales developed by this method and is due in all probability to 
ture of language itself, to the statements themselves, or to a lack of 
‘crimination of degrees of difference by the sorters, or to both. 
\s an inspection of the scales will show, the scale values are arranged 
regular descending order of magnitude of the scale values, a radical 
parture from the random arrangement used by Thurstone. 
Experimental investigations by the author* and Sigerfoos,® under the 
tion of Remmers, of three possible arrangements descending order, 
ling order, and random arrangement—showed no significant differ- 
On this account the regular arrangement as against the random 
ngement was indicated as a matter of economy of scoring. 
The median scale value of the number of statements endorsed is taken 
e attitude scale value of the mumber of statements endorsed. This 
hod facilitates scoring as compared with the method used by Thurstone 
| his associates. The median is used instead of the arithmetic mean as the 
ner yielded a slightly higher correlation between forms A and B than 
| the latter in several attitude measurements scored by both measures of 


tral tendency 


* Bateman, Richard M The Relati I ship between Altitudes toward School Sub 
ind Certain Other Variables. Studies in Higher Education 26, Bulletin XXXV, 
4 (Lafayette, Indiana: Purdue University, December, 1934), pp. 88-97. 
Sigerffos, C. C. The Validation and Application of a Scale of Attitude Toward 
n. Studies in Higher Education 31, Bulletin XXXVII, No. 4, (Lafayette 
Purdue University, December, 1936), pp. 177-191 





506 JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, N 


In interpreting the results obtained from these attitude scales 6.0 is 
obviously the indifference point on the scale. Any attitude score higher than 
6.0 indicates to that extent a favorable attitude, and any score less than 6.00 
is correspondingly an indication of an unfavorable attitude. 


RELIABILITY AND VALIDITY OF SCALES 


In determining the reliability of the two scales, A and B, the attitude 
of a group of 200 representative Peru high school students selected at randon 
and equally distributed over the ninth, tenth, eleventh, and twelfth grades 
were measured toward an assembly program—a talk on conservation by 
representative from the Indiana State Department of Conservation. 

Directly after the assembly program half of the experimental group of 
students were measured first with Form A, while the other half of the grou 
was being measured by Form B, and then the procedure was reversed. This 
procedure was followed in order to prevent any effect that might be t! 
result of answering one form first. 

A correlation of .87 + .01 was obtained from a comparison of the tw 
scales, A versus B, for the total group. A reliability of these proportions : 
sufficiently high for this type of group measurement. 

The validity of the scales is based upon the logic underlying the cor 
struction of such generalized scales, since the author was unable to discover 


outside criterion suitable for comparative purposes. 


CONCLUSION 
Some question might be raised as to the acceptability of the described 
scales as a standardized rating scale. There is no evidence that the ratings o! f 
the 200 representative Peru high school students are representative of hi 


school students in general. 


The technique described in the new construction of these scales might 
be used by others in developing measures of social-psychological phenomena 
using the results herein as illustrative and useful for comparable data. ‘ 











5 6.0 1S 
ler than 
an 6.00 


cover 


ribe ] 
gs or 
high 


ught 


ena 








REVISED STANFORD-BINET FOR UNIVERSITY STUDENTS* 


MiLprepD B. MITCHELL 
Bureau for Psychological Services, Minnesota 


Editor's note: The author finds the revised Binet an improvement over 
1916 edition for liberal arts college freshmen. It is still unsatisfactory 
for predicting success in the college of medicine. 
THE original Stanford-Binet was found to be inadequate for discrimi 
ng at the college freshman level. For example, DeCamp(2) found a 
relation of only 0.17 + 0.06 between the Stanford-Binet and freshmen 
rades for 115 freshmen. The average IQ was 106.0 with a sigma of 8.7. 
e maximum possible, of course, was 122. Terman and Merrill have added 
iny tests at the upper levels on the 1937 edition of the Stanford-Binet, 
that it is now possible for an adult to make an IQ as high as 152. The 
tion arose as to whether the new Binet would correlate better with col- 
e grades than those on the old Binet. 
To determine this, a selection of university freshmen* in Liberal Arts 
| all seniors in the Medical School at the State University of Iowa were 
given Form L of the Revised Stanford-Binet. For purposes of comparison, 
results on several other tests were also obtained. 


THE UNIVERSITY FRESHMEN 
Some 1,200 entering freshmen took the Iowa Qualifying Examinations 
1 September 1938. A sampling was tested from those in the lowest, middle, 
nd upper 10 percentile groups on these examinations. The results for these 
groups are presented in Table I. There is no overlapping between those 
tested from the lowest 10 percentile group and those tested from the highest 
10 percentile group. The reliability of the difference for these two groups 
; 3.0. 
Since 63 percent of the sixty-seven freshmen examined fell below the 
Oth percentile rating on the Iowa Qualifying Examination, the mean IQ of 
115 is too low for the class. The mean for the twelve students tested from 
the middle 10 percentile group is 122, which is probably more nearly the 
mean for the freshman class(4). 
* This paper was read at American Association for the Advancement of Science 
eeting, Columbus, Ohio, 1939. 
‘The writer wishes to thank Dean Lonzo Jones, Dr. Dewey Stuit, and Miss 


Mildred Heald for their splendid co-operation in obtaining freshmen for examination 
nd for making the grades and scores on the Iowa Qualifying Examination available. 


507 





4 JOURNAL OF EDUCATIONAL RESEARCH [ Vol. 36, N 


TABLE | 
IQ's ON THE REVISED STANFORD—BINET FORM L FOR UNIVERSITY FRESHMEN IN THI A 
LOWEST, MIDDLE, AND UPPER 10 PERCENTILE GROUPS ON THI 
IOWA QUALIFYING EXAMINATION Dine 
———— —_ —_ 
Percentiles | Sigma i 
f Iowa Cases Mean Median | Rang of 
" i 
Qualifying | Mean ¢ 
Examinations 
ly 
it 21 101.6 103 81 to 119 7.7 Y. 
41 i2 121.7 122. 5 106 to 134 6.4 
9 11 130. 1 127 23 to 143 5 side 
0} 
** ; ; ( 
[he Pearson product-moment correlation, for our sample, between the 
st 
freshman grades for the entire year and their raw score on the Iowa Qual 
. ; wu h g 
fying Examination was .74 (See tabulation below). The correlation between 
a 
saua 
the revised Stanford—Binet IQ and freshmen grades was .64 which is pra 
tically no better than the .62 obtained between the grades and the qualifyin r 
xamination for the whole class the first semester, and not quite as high , 
the .74 obtained for this group for the whole year, but certainly it is mucl Minne 
better than is usually found between freshmen grades and the old Stanford 
Binet. The correlation between the Iowa Qualifying Examination and tl st ty 
Revised Stanford—Binet was found to be .76. 
It would seem, then, that the Revised Stanford—Binet is quite a satis - 
; : : M 
factory measure of ability to do freshmen work at the University of Iow 
. (57a 
At least, it compares favorably with other methods now in use for measu 
ing the ability of college freshmen. For example, Langlie(3) found a1 
: . . - 5 
identical correlation, namely .64, between first term grades and results on 
. a , ' : a oe vet 
the Ohio State University Psychological Examination Form 18 for 187 fresh 
men at Wesleyan University in 1935. His correlations tended to decrea 
with length of time in the University. We shall see an even more marked 
: . ' ul 
decrease when we study our results for senior medical students 
Vey 
Qualifying 1937 Stan 
Grades Examination | ford Binet 
M 
;rades 74 64 
Qualifying Exan ation 74 76 ry 
137 Stanford-Binet 64 76 





REVISED STANFORD-BINET 


SENIOR MEDICAL STUDENTS 


All seniors in the School of Medicine were given the Revised Stanford 
and the Otis Self-Administering Tests of Mental Ability, Higher 

ation. All but five of the students had taken the Moss Medical Aptitude 
Fifty-five of the eighty-seven students were also given the short form 

the Stanford-Binet, i.e., the 1916 edition. Since the vocabulary test is 
ly considered to be the most important single test on the Stanford- 


bal 


the vocabulary score in terms of number of words correct was also 
idered separately. Intercorrelations were figured between the 1916 
iford—Binet, the Revised Stanford—Binet, the Otis, Moss Medical Apti- 
le, Grades and Vocabulary tests, (Table II). The vocabulary correlated 
est of all, only .04 with grades. The correlations of the four other tests 
h grades ranged from .15 to .18 which is practically what De Camp(2) 
id between freshmen grades and the 1916 Stanford-Binet. The correla- 
are all positive but so low as to be of little value in predicting grades 
[he correlation of .18 between the Moss Medical Aptitude and grades 
ssentially in agreement with the results obtained at the University of 
linnesota Medical School. Cavett, Henrici, and Lindley(1) found a cor- 
n of .17 between the Moss and the average medical grades during the 
two years and a correlation of .16 with grades during the sophomore 
These correlations are not significantly different from the correlations 
ween the intelligence tests and grades. In passing, it may be noted that 
Moss test correlates higher with the two editions of the Stanford-Binet 
and .56) than with the Otis (.37). 
Even though the correlations of the various tests were all low with 
s for the four years, it was felt that there might still be some differences 
ween the scores of those ranking highest in their class and those ranking 
The mean scores for the five lowest ranking students were consist 
lower than the class means on all the tests. The mean scores for the 
highest ranking students, and the means for those elected to AOA, were 
istently higher than the class means on all the tests. The differences, 
vever, were not statistically reliable. 
* The writer is indebted to Dean E. M. MacEwen of the State University of Iowa 


al School for results on the Moss Medical Aptitude Test and for making the 


es available to the writer 
*They obtained a correlation of .33 between the Minnesota Medical Aptitude 


sts and grades for two years of medicine 





510 JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, No.? 


TABLE II 


CORRELATIONS FOR SENIOR MEDICAL STUDENTS BETWEEN AVERAGE 
GRADES FOR FOUR YEARS 











Tests* 
Stanford- Binet | 
Tests - Otis Moss | Grades Vocab- 

j 1916 1937 | ulary 

—| -- _ — }---— - | — 
Stanford-Binet 1916 . . 69 . 60 | . 57 . 16 64 
Stanford-Binet 1937 | . 69 58 . 56 | 15 63 
Otis 60 . 58 . 37 . 15 44 
M oss . 57 . 56 x mt 44 
Grades 16 15 15 | a * é, 04 
Vocabulary | . 64 . 63 .44 | 44 04 








- +19 on 1916 Stanford Binet, IQ on 1937 Stanford-Binet, IQ on Otis Self Administering Higher Form 
ORs cal Aptitude percentile, and Number of Words correct on New Terman Vocabulary. 

There are at least two reasons for the low correlations between Revised 
Stanford-Binet and senior medical students’ grades. In the first place, the 
seniors are a highly selected group. If the whole of the freshmen medical 
group from which the senior group evolved had remained free from the 
eliminative action present in the medical school, the number of the senior 
group would have been about one-third larger, the range would have been 
greater and the correlation higher.‘ The range in IQ on the Revised 
Stanford—Binet for the medical students was 37 points (Table III) while 
for the Liberal Arts freshmen, it was 62 points (Table I). In the second 
place, there are limitations at the upper levels on the Revised Stanford 
Binet. It is impossible, for instance, to obtain a level of failures for most 
of the medical students (5). The distribution of IQ's is shown graphically 


in a previous article (4) 
SUMMARY AND CONCLUSIONS 


The Revised Stanford-Binet Form L correlated .64 with freshmen 
grades for a sampling of University of Iowa freshmen in Liberal Arts. It 
correlated only .15 with the grades for four years in medicine (class of 
1939). The new Binet, therefore, compares favorably with other tests of 


“When the sigma is increased from 9.40 (¢ for the medical students) to 13 
(~ for the freshmen) the r between grades and Revised Stanford—Binet is raised from 


r= .15 to R = .60 by the formula: = ane 
Ss 
- Vi—r 








REVISED STANFORD-BINET 


TABLE IV 


MEAN, SIGMA, AND RANGE FOR SENIOR MEDICAL STUDENTS 


Sigma Range 


6to 98 
. SS ae 4 a 100 to 133 
Binet 19387 IQ. _.... : 9. 111 to 148 
Binet 1916 IQ miei 90 to 122 
er words vocabu'ary.._. 3% $ 24to 41 








s Medical Aptitude Test, Otis Self-Administering, Revised Stanford-Binet, Stanford-Binet 
of words on the new Vocabulary. 
and scholastic ability such as the Iowa Qualifying Examination and 
hio State University Psychological Examination and is a definite im- 
nent over the 1916 edition of the Stanford-Binet at the freshman 
On the other hand, it is just as unsatisfactory as the 1916 edition and 
tests at the senior medical level. Honor students in medicine tend to 
gher on all tests than the average, and the lowest ranking students 
to test lower. The low correlations between grades and test scores for 
senior medical students may be attributed to the highly selective group 
he limitations of the tests at the upper levels. 


REFERENCES 


Henrici, A. T. and Lindley, S. B. “Tests of Medical Aptitude at 


” 


’ Journal Associate American Medical College: (September, 1937). 


P, E. “Studies in mental tests: Army Alpha, Thurstone IV, and Binet 
mon R.)" School and Sociology, XTV (1921). Pp. 254~-258 
T. A. “Predicting Scholarship.” Journal Higher Education, Vl (1938) 
390—39 1. 
tchell, M. B. “Revised Stanford-Binet for Adults 
(1941). Pp. 34, 516-521. 
Mitchell, M. B. “Irregularities of University Students on the Revised Stanford 
Binet.” Journal Educational Psychology, XXXII (1941). Pp. 513-522 








THE VOCABULARY TESTS OF THE REVISED STANFORD-BINE] 
AS INDEPENDENT MEASURES OF INTELLIGENCE 


GEORGE SPACHE, Ph. D. 


Psychologist, Priends Seminary, Brooklyn Friends School and the Grace Church 
Schools, New York City, 15 Rutherford Place, N« 


w York City 


Editor's note: Understanding of vocabulary has been known for sor 
time to be closely associated with academic aptitude. The author finds it no 
safe substitute though for more comprehensive measures of intelligence 


ON SEVERAL occasions, Terman(1) has remarked that the vocabulary 
test of either of the Stanford—Binet scales is markedly related to the result 


single test in the sale’ (1, p. 302.) In support of this, Terman gives correla 
tions of .65 to .91 with an average of .81 between the vocabulary test and 


on the entire tests. He characterizes the vocabulary test as “the most valuab! 


mental ages on the Revised Stanford—Binet. Similar evidence is offered 


Elwood(2) who found a correlation of .978 between scale mental ages a: 


vocabulary score. 


This evidence of the validity of the vocabulary tests has prompted 
attempt to determine the possibility of their use as independent, abbrevi: 


TABLE I 


RESULTS ON THE VOCABULARY TESTS CONTRASTED WITH MENTAI 
AGES FROM THE ENTIRE SCALE 


Variables 


V ulary Score—M. A. Full Scale 
Vocabulary Score—M. A. Short Scale 
Vocabulary M. A M. A. Full Scale 


Vocabulary M.A M. A. Short Scak 


Vocab. M.A M. A. Full or Short Scale 


Vocab. and Def. M. A M. A. Full or Short Scale __| 


I Vocab. M. A M. A. Full or Short Scak 
( A. under 4-1 
Pic. Vocab. M. A M. A. Full or Short Scale 
( 4. under 4-7 
Pic. Vocab. and Vocab. M. A M. A. Full or Short 


cale 


me 


P-V, Def. and Vocab. M. A M. A. Full or Short 


»caie 





. 847 


Mean Ss 


Score 9.4 
Scale 8 8.2 
Score 9.4 
Scale 8 8&6 
Vocab. 9—- 1.4 
Scale 8- 8.2 
Vocab. 9 0.5 


Scale 8 8.6 
Vocab. 8 5.7 
Scale 7-— 6.5 3 
| Vocab. 8 0.9 3 
Scale 6— 5.3 3 
»~V 3— 7. 3) 
Scale 3— 9.7] 
P-V 3- 9 
Seale 4-2 
P-\ 4-11.6 
Scale 5 1 | 
P-V 4— 8.2 
Seale 4-9 ‘ l 


ited 


TESTS OF THE REVISED STANFORD-BINET 513 





res of intelligence. Since, in a sense, it too is a vocabulary test, the 
Vocabulary section of the Revised Stanford-Binet has also been 


} 


ed-in this fashion. 


In the consideration of the use of the vocabulary tests as independent 
res of intelligence, the major facts of the table support the conclusion 
there are marked relationships between vocabulary M.A.’s and M.A.'s 
either a long or short scale. The conclusion is also possible that the 


lary M.A. is superior to the vocabulary raw score in predicting scale 


here was a slight tendency to overestimate scale M.A.’s when using the 
ary M.A.’s as a predicting agent (particularly in predicting full scale 
evidenced in the higher mean vocabulary M.A.'s. This tendency 

a result of this particular population which was drawn from 

private nursery and college preparatory schools where median 

well over 100. In such a population, one would expect to find 

lary abilities well in advance of other mental abilities, in keeping with 
well-known results of private-school and gifted child testing. This indi- 
result does not invalidate, however, the conclusion that scale M.A.’s 


predicted with a good degree of accuracy by use of vocabulary M.A.’s 


In our population, underestimation or overestimation of scale M.A 
using the vocabulary M.A. was as great as three years, in some cases 
restimation of scale M.A. by approximately one year or mere occurred 
i4 percent of the cases. Overestimation of similar amounts occurred in 
ercent. The tendency to overestimation is much less, however, in the 
population (See line 5) when vocabulary M.A. is correlated with 
M.A. from full or short testing combined. This is probably due to the 
that short testing tends to give slightly higher M.A.'s than does full 
ng.(4) Hence, both short test M.A.’s and vocabulary M.A.’s tend to 


mble each other closely, since both are apt to overestimate full scale 


s. In the larger population, numbering 86, overestimations of scale 


by more than a year occurred in only 9 percent of the cases; under 
ation by similar amounts occurred in 15 percent of the entire popula 
The smaller difference between mean vocabulary M.A.'s and _ scale 
M.A.’s, in the case of short scale testing (See line 4), is further confirma 


of the fact that the tendencies to overestimation or underestimation, ar« 





514 JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, N 


less, and the resemblance greater when vocabulary M.A.’s are compared with 


} 


short scale results than with full scale results. 

If the definitions test (Form L, year V, test 3) is considered as a lower 
extension of the vocabulary test, it is possible to increase the range of vocab 
ulary M.A.’s at the lower level and include many of the younger subjects 
In the scattergram, the definitions test was treated as the vocabulary level 
between a score of 0 and 5, and assigned a M.A. of 5 years. Vocabulary 
M.A.’s with and without inclusion of the definitions test are contrasted with 
scale M.A.’s in lines 5 and 6 of the table. Use of the definitions test in 
creases slightly the efficiency of prediction of scale M.A.’s from vocabulary 
M.A.’s. However, the inclusion of the definitions test increases the tendency 
to overestimate scale M.A.'s owing to the fact that the procedure of scoring 
permits the achievement of a vocabulary M.A. by more subjects with lowe: 
scale M.A.’s. Underestimation of scale results by more than a year occurt 
in 14 percent of the cases; overestimation, in 26 percent. Overestimations 
occur in almost three times as many cases when the definitions test is en 
ployed as a lower extension of the vocabulary test. Underestimations of scal 
M.A. are similar with or without the definitions 

Tests such as the Picture Vocabulary served as independent measures o! 
intelligence long before their incorporation into the Stanford-Binet scal 
notably in such scales as the Van-Alstyne Picture Vocabulary Test(5). Ar 
attempt to use the picture vocabulary series of the revised Stanford—Binet 
Form L, as an independent measure of intelligence is presented in lines 
to 10 of the table. Of course, any conclusions concerning the picture voc 


ulary test must be tentative owing to the small number of cases concerned 


Che picture vocabulary test M.A. is markedly related to scale M.A. it 
cases under 4 and those under 4-6 in chronological age. Above the C.A. of 
i, the tendency to underestimate scale M.A.’s becomes increasingly evident 


both in the greater difference between the mean P—V M.A. and scale M.A 


nd in the lesser variability of the P-V M.A.’s. Owing to the small num! 
of cases, it is dificult to conclude whether the P-V M A. functions as we! 
at its levels of fulness as the vocabulary M.A. does in the higher levels 


If the M.A.’s from the P-V and vocabulary tests are averaged, it 
possible to determine another type of vocabulary M.A. Data regarding 
relationship of these average M.A.’s with scale M.A.’s are given in line 9 of 


the table. Again, the number of cases is small but the data support the tenta 





(nol 





I 


TESTS OF THE REVISED STANFORD-BINEI 


conclusion that there is a marked relationship between the M.A.’s with 
tendency for the combined vocabulary M.A.’s to underestimate the 


results. 


To increase the number of cases included in a study of the combined 
bulary and P—V mental age, the definitions test may again be considered 
in extension of the vocabulary test or as a link between the two series of 
bulary items. M.A.’s derived from the P-V were averaged with M.A.’s 
the definitions test, if the vocabulary test was failed, or with the vocab 
M.A. The relationship between the combined P-V, definitions and 
ilary test M.A. and scale M.A.’s is slightly greater than the relationship 
hout use of the definitions test. Moreover, the tendency to underestimate 
M.A.’s is much less than when using the simple average P—V and 


lary M.A. 


To sum up the experiment in the use of various portions of the revised 


nford—Binet scale as independent measures of intelligence, we may con 
le, with due respect for the limitations imposed by the small numbers 


ses concerned: 


1. M.A.’s derived from the Stanford—Binet scale may be predicted with 
fair degree of accuracy by the M.A.’s from the vocabulary section. The 
lency to overestimation of scale M.A.’s in this prediction (which may 
been due to the nature of the population used in this experiment) ts 


; in predicting scale M.A.’s derived from abbreviated testing, than thos 


full scale tests. 

2. Prediction of Stanford-Binet scale M.A.’s when using the definitions 

1s a lower extension of the vocabulary series compares favorably with 
liction by means of the vocabulary series alone. In this particular popu 
ion, overestimation of scale M.A.’s increased when this method of pre 
tion was employed. 

3. M.A.’s derived from the picture vocabulary items may be used to 
lict scale M.A.’s with a fair degree of accuracy, particularly among sub 
ts with chronological ages of less than 4—1. There is a tendency to under 


timate scale results in this method of prediction 


i. M.A.’s derived from the combined picture vocabulary and vocabulary 
ries may be used in predicting scale results, with fair accuracy. Similarly 
ntal ages derived from the picture vocabulary, definitions test and vocal 


lary items combined may be used to predict scale results, with fair a 
here is some tendency for the M.A.'s derived from either of these 


nations of items to underestimate actual! scale M.A.’s 











possible exception of the vocabulary mental age, our data d 
use of any | 


individuals 


IURNAL OF EDUCATIONAL RESEARCH {Vol ‘ 


4t. SD ‘ 


sortion of the Stanford-Binet scale in predicting 


It is true that there are marked relationship 


rtain portions or combinations of portions of the scale and result 


dministration of full or abbreviated testing 


However, these rela 
at enough to justify the use of a small portion of the 


as a substitute for more complete testing Comparisons of res 


vocabulary test and on the fuller testing should prove illuminating 


cases of | 


ilar degree of development in an important area of 


" \4 , 
i-lingualism, speech disturbances, foreign birth 
and verbalism. 


BIBLIOGRAPHY 


1 Merrill, Maud A. Measuri ] (B 
37) 

A Preliminary Note on the Vocabulary Test in the Revis 
urnal Educational Psychology, XXX (Nov 
Form of the Revised Stanford—Binet Scale, Form I 


( ulting Psychology, V1 (Mar—April 1942) 


Testing with the Revised Stanford—Binet Scale, Form L, 


urnal of Orthopsychiatry, XI (January 





3 
r 











4 STUDY OF THE RELATIVE VALUES OF TWO MODIFICATIONS 


OF THE TRUE-—FALSE TEST 
FRANCIS D. CurRTIS 
University of Michigan 
WeESLEY C. DARLING and NINE HENRY SHERMAN 
University of Michigan High School 


Editor's note: One of the very commonly employed objective type of 
tive test is the true-false test. The author presents a modification of 
ventional test in this area. 
SOME years ago, one of the authors in collaboration with a colleague, 


shed a study of the relative merits of the conventional true false test 





f a modification of this test.t With this modification, the pupils were 
red not only to decide whether a statement were true or false, but if it 
false, to make it true by crossing out the incorrect word or phrase in 
original statement, and by then substituting a correct word or phrase 
No credit was given for a false item which was merely indicated to 
lse but was not corrected, or for a false item which was corrected by 
ere insertion of the word “not.’’ The authors concluded the report of 


investigation with these statements: 


From this investigation it seems reasonable to conclude that this mod 
form of the true—false test is more difficult than the conventional true 

It is inferior in that it requires more time to administer and correct 
the old form; but it is superior to the true false in the following re 
ts: (1) It is to a much greater extent a power test. (2) It possesses 
r value for diagnosis of individual and class difficulties. (3) It gives 
t basis for homogeneous grouping over subject-matter units. (4) It is 
reliable since, in so far as elementary [secondary-school} science is con 
1, it has a reliability comparable with the best of the other “new-type 
(5) It is more popular with the abler pupils because of their belief 
superior merits. (6) To all intents and purposes it eliminates the 
ent of chance.” 


Following the investigation, this modification of the true—false test was 
1 extensively in the Department of Science of the University of Michigan 
ligh School. In spite of its advantages over the conventional true—false 
however, its use was attended with some dissatisfaction for these rea 


Scoring was slow and more or less difficult because corrections of the 


McClusky, Howard Y., and Curtis, Francis D. “A Modified Form of the Tr 
Te t. ] urnal y Educ stional Re @ari } XIV (1926), 214 224 


517 








RNAL OF EDUCATIONAL RESEARCH {Vol. 36, N 


ilse items were made by the pupils at various places in the items; and 
the scoring was to some extent subjective since several pupils often 
made different acceptable corrections of a single false item. 
lo eliminate these difficulties, a second modification was tried. In this 
word or words were underlined in every item. The pupils were then told 
to indicate the true statements by writing “‘R” in the blank at the right of 
each; and to make each false item true by marking a cross through each 
word or words that needed to be changed, and by then writing in the blank 
the right of the item the word or words to be substituted for those crossed - 
t. The pupils were not permitted, however, to change any underlined R ni 
rd. Examples of this type of test follow: 


WOLAUSC 


[he star nearest the earth is the sun _____- ee ee ge 
6. An eclipse of the moon occurs only at new moon -------- 6 ninth 


The pupil marked these items thus: ) 


The star nearest the earth is the sun _....._ ~~ 


An ipse of the deer occurs Only at new moon sun 6 


Chis modification (Form II) was found to be an improvement over the 
earlier one both because it provided for all scoring on the right-hand margin 
of the paper and because it increased the objectivity of the false items. Its 

xteasive use in the classroom, however, revealed the fact that pupils some 
nes found difficulty in changing an item even when they recognized that 
t was incorrect because they could not decide what word or words to change 
from among those that were not underlined. In other words, occasional 
tested the pupil's intelligence rather than his knowledge of science 
by reducing the validity of the test. 


[he authors’ attention was later directed to a third modification of th« 
ilse test which seemed likely to eliminate the objection just stated 1 
while at the same time retaining advantages of the second modification. In xat 
this newer form (Form III) at least one word is underlined in each state 
nd the pupil is required to make each false statement true by chang 
, one or more of the underlined words. 


Ex umple s of items of Form III follow: 


rue—! 


The star nearest the earth is the sun __.__-_- eo 


he eclipse of the moon occurs only at new moon ----- 2 





TWO MODIFICATIONS OF TRUE-FALSE TEST 


[he pupil marked these items thus: 


1. The star nearest the earth is the sun z.. Right 





2. The eclipse of the n occurs only at new moon sun 


[he report of a comparative study of these last two modifications of 
true-false test follows. 
METHOD 
The subjects of this investigation were 455 pupils enrolled in twenty 
ses in the Department of Science in the University of Michigan High 
ol. By combining classes of the same grade, the twenty classes were 
luced to fourteen groups. Thus there were three groups of seventh-grade 
neral science, two groups of eighth-grade general science, two groups of 
nth-grade advanced junior science, two groups of tenth-grade biology, 
groups of eleventh-grade chemistry, and two groups of twelfth-grade 


CS 


Preceding the administration of the tests for each part of the investi- 


n, the pupils in every participating class were given careful instruction 
| extensive practice in taking both modifications of the true—false test. 
he investigation, a test of fifty items was prepared for each class. 
items were mimeographed on two sheets, with no changes whatever 
he wording of the items, but with different words underlined in the same 
m on both sheets, and with the order of the items on the two sheets 
as much as possible. 
The following directions were given at the top of Sheet I. 
If a statement is true as it stands, write ‘““Right’’ in the blank at the 
zht of the statement. If a statement is false, make it true by marking a 
ss through the word or words you need to change in order to make the 
itement true, and by then writing in the blank the word or words you wish 
ise in place of the word or words you have marked with crosses. DO 
NOT CHANGE ANY WORDS THAT ARE UNDERLINED. For 
xample, 
1. The largest city in the United States is New York 
The first month of the year is September 


You would mark these as follows: 


1. The largest city in the United States is New York Right 


The first month of the year is Schigfnber January 2 








520 JOURNAL OF EDUCATIONAL RESEARCH {Vol. 36, N 


The following directions were given at the top of Sheet II. 


If a statement is true as it stands, write “Right” in the blank at the 
right of the statement. If a statement is false, make it true by marking with 
a cross one or more of the underlined words and by then writing in the 
blank the word or words you wish to use in place of the underlined word 
or words crossed out. DO NOT CHANGE ANY WORDS THAT ARE 
NOT UNDERLINED. For example, 





1. The largest city in the United States is New York —-------- l 
». The first month of the year is September i Bees 2 


You would mark these as follows: 


— 


The largest city in the United States is New York Right 1 


. The first month of the year is — ice January 2 
It will be noted that the essential difference between these two modif- 


cations of the true—false test is that with Form II (Sheet I) the pupil is 
required to decide whether a statement is true or false, and then, if it is 


nN 


false, to make it trae by changing one or more words which are not under 
lined ; while with Form III (Sheet II) the pupil is required to decide whether 
a statement is true or false and then, if it is false, to make it true by changing 
one or more words which are underlined. 

Both Sheet I and Sheet II were administered to the members of each 
class during a single examination period, one sheet being given to the pupil 
after he had completed and handed in the other. In order to equalize pra 
tice effects each sheet was administered first to about half of the classes. 

The time required for taking each form of the test was recorded for 
each of the eleven classes in which it was practicable to do so 


FINDINGS 


Table I indicates that neither form of the modified true—false test 
possesses an important advantage over the other with respect to the amount 
of time required for administering the tests. An inspection of the data re 
veals, however, that each class required considerably more time in complet 
ing whichever form was administered to it first. Thus the 1214 percentile 
pupil required on the average about 10 minutes more with the first form 


administered than with the second; the median pupil, an average of about 
12 minutes more; and the 874 percentile pupil, an average of about 13 


a 











TWO MODIFICATIONS 








: TABLI 
’ M reES REQUIRED BY THE 12Y, 
at the EF 8714 PERCENTILE PUPIL O1 
1g witl re THE MODIFIED FORMS Of 
in th 


OF 


[RUE-FALS!I 


I 


PERCENTILE PUPIL, TI 
EACH 


THI TRUE-—I 


Form II 





VIS 











TEST 52 


E MEDIAN PUPIL, 


H EACH OF 
[EST 


Form III 


Number 12% . Ly 8714 
Grad of Percen- | Median | Percen Median | Percen 
Pupils tile Pupil tile Pupil tile 
1 Pupil Pupil Pupil 
2 XII 16 18 26* 29+ XK 12 17 
XII 16 12 14 21* 25% $1" 
XI 28 r 11 14 15* 20+ 24* 
XI 2 13" 22% 26* 7 9 il 
Xx 21 i 21 2 30* 35* 
1 VIII 23 13" 17* 30* 9 12 20 
VIII 20 11 15 i 25* 30* 37* 
VII 24 12 15 21* 26* 29* 
: VII 24 25* 1+ i3 18 25 
Vil 25 12 15 26* 30* 95* 
VII 25 19° 22" 11 13 is 
10% ‘ VII-XTI 49 i2 17 10 20 30 
a ; VII-XII 249 14’23” | 18 23 9” | 20° 1 25’ 16 
if ° ] 
a VII-XII 228 14’ 47 18’ 42 23’ 22 27 19’ 23 24’ 23 
1ethe . : — : ‘ 
, tes test form administered first to each respective class 
ng es were recorded for Grade IX . 
asses were combined here in a single distributi , 
X was omitted in con puting the weighted averages because there was no < ther tenth- 
: ng Form II first with which to balance it. 
C 
> more. The 1214 percentile pupil of all the classes combined required 
pr ites longer to complete Form II than to complete Form III; but the 
pupil and the 8714 percentile pupil required respectively three 
| for tes and four minutes longer to complete Form III. It should be noted, 
at six of the eleven classes took Form III first. It seems reason 
\ infer that had there been another class taking Form II first. the 
e in time required for taking the two forms would have been sub 
t¢ vy less 
y 2.8 
” Although somewhat questionable from a statistical standpoint, a com 


: was made of the weighted averages? of the time requirements 

| 
tl zg sv ( t I lian on € fe of 
rn f 1214 pe ntile pupil of ea class, was ultiplied by the ber of 
at « Ss The sum tr the el ven products thus « btaine j was div I d by the 
0 ber of pupils, 249, to give the weighted-average time. Weighted averages 
percentile pupil, the median pupil, and the 




















JOURNAL OF EDUCATIONAL 


(hese weighted averages indicate a small but consistent time advantage ip 
nearly two minutes for the 12¥4 percentile and the 
median pupils, and slightly more than two minutes for the 8714 percentile 
pupil. When, however, the weighted averages are computed for an even 
number of classes taking each test first, the time differences are reduced 


favor of Form II 


TABLE 


II 


RES! 


COMPARISON OF THE SCORES FROM TESTS OF MODIFIED 
FALSE TEST 


OF THE TRUE 


Grouy Grade Number of! Form Range 
Pupils 
\ XII 16 *II 28-46 
II! 32-49 
B XII 16 I 24—48 
*IIl 20-49 
é x! 28 II | 17-42 
*ITl 16—42 
XI 2 “11 =|: 19-40 
Ill 19-43 
M x! a7 *II 13-48 
N Ill | 16-49 
| 

E | x 2 II 9-40 
“III | 19-41 
Oo x $8 til 16-46 
III | 13-48 
L IX 11 | 19-46 
| Im | 22-44 
P| Ix 20 *II 22-42 
II | 21-42 

| | 
I | VIII 43 #11 41 
G Il 24-43 
Q VIII 60 *II | 16-43 
R Ill 20-45 
H Vil 18 #IT 1541 
I III | 1941 
VII 3¢ II | 13-47 
Ii! 13-48 
J VI 0 #11 8-40 
K II 12-43 
All VII-XII 455 Ir | 848 
III | 12-49 


* Indicates which form was administered first. 
# Indicates that each form was administered first, half of the time. 


** Indicates that no record was kept as to which form was administered first. 


29. 67 
31. 04 


33. 19 
34. 49 


29. 10 


31. 08 | 


82. 06 


00 


94 


1R¢ H 


(I 


} 2 " 
al { \ 


ForMsS II AND III 


MIII-MII 


SDDm 


76 | 


- 41 


18 
58 


86 


61 


969 


. 377 


| 
| SDIII- 


904 














ntage in 
and the 
-rcentile 
aN even 
educed 








5 
BY 
%, 

F! 
S 

e 

4 


TWO MODIFICATIONS OF TRUE-FALSE TES1 





latter weighted averages show an advantage of Form II over Form III 
ess than a minute for both the 1214 percentile and the median pupil, 


/ 


of about a minute for the 8714 percentile pupil. 


- 


lable II presents evidence to indicate that Form II is somewhat more 
ult than Form III. There is little difference in the ranges of scores with 
two forms with the individual groups, but when all classes are combined, 


ere is a slightly greater range of scores with Form II (8-48) than with 


III (12-49). More convincing evidence of the greater difficulty of 


rm II is shown by the critical ratios. Twelve of the fourteen groups have 


| critical ratios in favor of Form III, indicating greater difficulty of 
1 II, while only two groups similarly indicate a slightly greater difficulty 
Form III. Of all the critical ratios, however, none is statistically signif- 

Yet the fact that the critical ratio obtained from scores of the com- 


ned group of 455 pupils is 1.899, indicates 971 chances in 1000 that if 


nvestigation were repeated the scores on Form III would again be higher 


n those on Form II, 


The evidence offered by Table II indicating which form differentiates 
better between the good and the poor pupils, is conflicting and incon- 
ve, though on the whole it is to some extent favorable to Form II. 
s with all but two groups, B and D, the standard deviations with Form 

somewhat greater than those with Form III; thus a slight superiority 


favor of Form II is indicated. When, moreover, the difference of the 
ndard deviations is divided by the standard deviation of the differences 


} 


the standard deviations,*? the results show slight but not statistically sig 
icant differences favoring Form II, with all groups except B and D. With 
| groups combined, moreover, the ratio of —.904 indicates 817 chances 


oa 


SDDy / ‘SDur ‘ nm S Da : 
A VNua VNu 


For convenience, this partial formula is used here. The complete formula would 


ely tend to increase, somewhat, and therefore to make somewhat more significant 
results obtained from this partial formula 


S.D.» §.D.in— $.D.n 





SDD. /{ SD.m \’ 4 SD \ See also preceding footnote 
V \ViNm V2Ne 

















524 JOURNAL OF EDUCATIONAL RESEARCH {Vol. 36, N 


in 1,000 that if the investigation were repeated the differences would 
again indicate an advantage in favor of Form II. 


Table III furnishes a crude measure of the validity of both forms of 
these modifications of the true—false test. An inspection of the mean scores 
made on both forms by ability groups of pupils in each grade, selected on 
the basis of teacher-judgment without reference to the scores made on these 
tests, indicates that both forms of the test are valid, since each form differ 
entiates consistently between the best, the medium, and the poorest pupils 


in each class. 


The data in Table III also seem to indicate that the two modifications 
of the true—false test are about equally valid, since neither has a significant 
or a consistent advantage in differentiating between the best, the medium, 
and the poorest pupils. In not a single group is there a progressively smaller 
difference between the mean scores of the three groups of pupils within the 
class on the two tests. If, with all the classes there had been a similar, pro 
gressively decreasing difference between the mean scores of these three groups 
on the two tests, the evidence would indicate a consistent superiority of 
Form III over Form II as a means of differentiating between the good and 
the poor pupils. But the data fail to reveal any such consistent superiority 
With four classes, however, (D, H-I, M—N, and Q-R) there is a progres 
sively larger difference between the mean scores of the three groups ol 
pupils within the class on the two tests. This fact indicates a slight superi- 
ority of Form II over Form III as a means of differentiating between the 
good and the poor pupils. 


Table IV furnishes data to indicate that the two modifications of the 
true—false test are of approximately equal reliability. The data show a con- 
siderable variation in the reliability coefhcients of the two forms when cor 
relations are computed between the scores on the two halves (odd items vs 
even items) of the tests.‘ When the reliability coefficient is computed for 
all fourteen groups combined as a single group, however, the coefficient is 
found to be .693 + .016 for Form II and .698 + .016 for Form III, a 
difference of only .005 in favor of Form III. This difference is too small 


*In calculating the correlations the following formula was used 
V (N2x? — 2x. 2x) (NZy’— Zy. Zy) 


N2xy— 2x . Zy 





— 








RS: aes coe’: i 


6 
id 
4 
? 
S 





would 


ms of 
SCOTEs 
ed on 
these 
lifter 


yu pils 








3] TWO MODIFICATIONS OF TRUE-FALSE TEST 525 


J34 


TABLE III 


SON OF THE MEAN SCORES ON Forms II AND III, oF VARIOUS GROUPS OF 
PuPILS SELECTED ON THE BASIS OF TEACHERS’ JUDGMENTS 
WITHOUT REGARD TO TEST SCORES 





Mean Scores of Best | Mean Scores of Middle | Mean Scores of Poorest 
Form 20 Percent of Pupils 60 Percent of Pupils | 20 Percent of Pupils 
a | J | € ae 
Il 44. 67 | 42. 70 | 30. 33 
11] 46. 33 42. 70 32. 33 
Difference | 1. 66 | 0. 00 2. 00 
II 44. 33 42. 50 31. 33 
It 43. 67 | 41. 50 30. 67 
Difference 66 | 1. 00 66 
1 39. 50 30. 44 22. 00 
Ill 38. 67 | 29. 50 i 22. 00 
Differences . 83 | 94 | 0. 00 
I 38. 80 29. 35 21. 60 
ll | 39. 60 30. 76 23. 40 
Difference | 80 1. 41 | 1. 80 
Il | 33. 50 30. 69 | 19. 50 
Ill | 34. 25 30. 46 | 23. 50 
Difference 75 23 | 4. 00 
| 
II 41. 00 32. 56 22. 67 
Ill 42. 33 32. 67 | 27. 33 
Difference 1. 33 11 4. 66 
| | 
II 37. 00 33. 96 | 28. 44 
I] 38. 67 34. 08 31.11 
Difference 1. 67 .12 | 2. 67 
II | $2. 80 28. 46 22. 50 
Ill 33. 40 29. 11 23. 80 
ifference | . 60 65 1. 30 
II 35. 20 25. 30 16. 60 
Il 36. 10 25. 70 | 17. 80 
Difference | 90 40 | 20 
1" 43. 71 33.17 22. 71 
tI 44. 00 34. 43 25.14 
Difference | 29 1. 26 2. 43 
I 43. 50 3 32 | 24 00 
Ill | 44. 38 36. 23 23. 75 
i Terence RS 91 25 
II | 39. 75 30. 83 24. 00 
II! 89. 50 | 31. 33 24. 75 
Difference 25 50 75 
' 
II | 35. 83 29. 11 21. 50 
Ill | 37. 77 | 31. 86 25. 00 
ifference | 1. 94 2.7 3. 50 
| | | 
Il | 40. 43 30. 55 20. 29 
II! 42. 00 31. 00 | 21. 71 
Difference 1. 57 45 | 1. 42 





JOURNAL OF EDUCATIONAL RESEARCH  {Vol. 36,N 


TABLE IV 
COMPARISON OF THE RELIABILITY COEFFICIENTS OF THE TWO HALVES RESPECTIVE! 


OF Forms II AND III oF THE MopiFIED TRUE—FALSE TEST 





; = ~ 
Number of Reliability, r, Reliability 








Grouy Pupils of Form II of Form III 
16 . 77 
B 16 | 8&3 8&2 
( 28 75 . 67 
D 27 | 75 . 66 x 
21 33 73 : 
I 15 RS | 83 t, 
k ei 43 . 52 | 58 ; 
H—I 48 44 7 
J—K 50 77 | 75 
M—N 37 . 82 74 
U 38 R2 4 
I 20 73 65 
Q—R | 60 46 45 
| 36 R3 83 
All 455 . 693 =. 016) . 698 &, O16 
| 





to permit the making of any definite conclusion regarding the relative reli 
abilities of the two forms of the true—false test. 

Table V indicates that both modified forms (Forms II and III) of the 
true—false test have reliabilities higher than the reliability of the convention 
true—false test experimented upon by McClusky and Curtis. Both forn 
have reliabilities which are less than the reliabilities of the modification ot 
the true-false test experimented upon by McClusky and Curtis and of 
onventional and modified forms of the multiple response test experimente 
upon by Curtis and Woods.? Either modified form, however, may be cor 
sidered to be sufficiently reliable to justify its use. 

Several other statements seem justified by the extensive use of both o! 
these modifications of the true—false test, although no objective evidenc 
supporting them can be presented here. The time required for the constru: 
tion of the tests is the same in both modifications. The scoring of Form I! 
requires a slightly longer time and is slightly more subjective than Form II! 
Both modified forms seem to develop in the pupils to a greater extent tha 
does the conventional true—false type the habit of critical scrutiny and car 
ful analysis of every item; and both tend to a certain extent to eliminat 


the element of guessing 


Op. Cit 


* Curtis, Francis D., and Woods, Gerald G., “‘A Study of a Modified Form ot 
Multiple-Response Test,’ Journal of Educational Research, XVIII (1928), 211 




















TWO MODIFICATIONS OF TRUE-FALSE TES!I 





TABLE V 


ON OF THE RELIABILITIES OF ONE HUNDRED ITEMS OF CERTAIN TYPES OF 
F—FALSE AND MULTIPLE—RESPONSE TESTS WITH RELIABILITIES OF ONE 
HUNDRED ITEMS OF THE AUTHORS’ FORM II AND FoRM III oF 
THE MODIFIED TRUE—FALSE TEST COMPUTED BY BROWN ’S 
FORMULA* AND CORRECTED FOR DISPERSION 


Number of Reliability Reliability, 
est Pupils | Half vs. Half**| 100 Items*** 
, oa 
y-Curtis | 
False (conventional | 115 53.045 | 764 
d Form | 115 76 =. 026 904 
\ is | 
ce (conventional . 206 70 =. 024 | 859 
ed form . 206 73.022 | R51 
I] 
lified True—Fa ls« : 455 693 +. 016 , 805 
Ill | | 
ed True—False oan 455 698 =. 016 825 


wenty-five items each for McClusky and Curtis’ test; sixty items each for Curtis and Woods’ 
y items each for the author’s tests. 
rected for dispersion by means of the formula: 








a V 7,10 _1 . ; , . , . 
x ——- = — ae . Kelly, Truman L. Statistical Method. (New York: The Macmillan 
- V R,1ld—, I Company, 1923), p. 221. 


SUMMARY 


so far as the results of this investigation may be indicative, neither 
e modified forms of the true—false test possesses marked advantages 
the other. They are of about equal difficulty for the pupils though 
[I is slightly more difficult than Form III. Both modified forms re 
about the same time to construct, administer, and correct; the scoring 
rm III is somewhat more objective than that of Form II. Both forms 
ntiate about equally well between the good and the poor pupils. Both 
ire equally valid and equally reliable. Both modified forms tend to 
ute thinking on the part of the pupil while he is taking the test and 
forms tend to eliminate whatever guessing factor may operate in the 
tional true—false test. The teachers who took part in this investiga 


fter long continued trial of both forms have come to prefer Form III 











MEASURING KNOWLEDGE OF PRECISE WORD MEANING j 
a 
Lee J. CRONBACH ‘ 
§ ( li 
O f + ne ton t field of educati 
f i C f w i ung The at 
i in [ i i 
THE NEED FOR IMPROVED VOCABULARY TESTING 
ONE of the significant problems of educational measurement is t! , 
- need for improved vocabulary tests. A knowledge of technical vocabulay 9 of 
prerequisite to comprehension of textbooks and discussions, and is ofter 
the KCy to bas ( ide as 1n the subject The teacher who seeks to build con ept 
nm the 5 I's mind is constantly confronted with the problem of determi: ) 
how well the pupil understands each term, yet vocabulary tests in the 
have rarely been diagnostic. Furthermore, they have generally assumed 
1 word is either known or unknown, so that a simple test can determin 
percentage of the class knowing each word. More careful analysis of 
problem of word knowledge has suggested that a student may know a wor 
more or less well, and that testing should determine the degree to which | 
understanding is omplete rather than to say that he “knows” or “does 
know’ the word. In other words, it is important to determine how precise 
a conc he has Acq ired 
Many tests require the pupil to respond with definitions or synonyms I 
that is, other words—instead of determining whether each word has mean , 
ing for him in life situations where he must use it. The pupil may know “! 
definition verbally, without having the ability to apply it properly. He ma 
ssociate a word with some situations, but lack a broad enough generalizati ; 
f its meaning to recognize other situations to which it applies; contrariwise 
the pupil's concept may be insufficiently refined, so that he includes too mat 
hings within it. An effective measure of his knowledge of the concept must xerci 
heck on his ability to apply the concept, perhaps in situations he has never f 
before encountered, for the value of technical terms, as in science, is that 
hey make possible generalizations so that knowledge can be transferre 
I bx ' g r withad iti of y 
in an earlier article. L. J. Cronbach, An analysis of os 


lary testing, Journal Educational Research, in pre 





MEASURING PRECISE WORD MEANING 529 


problems. The boy who can locate the minuend in the subtraction 
7 


ple 2 but cannot apply the term in the example 8 — 4 has a concept 


; far from full usefulness. It follows that the ideal vocabulary test will 
mine reliably, for each individual word, whether the student can apply 
vord in every situation where it would be helpful to his thinking. 





BEHAVIOR INDICATING FUNCTIONAL WORD KNOWLEDGE 


From an examination of the behavior which indicates true understand- 
f words we may determine how to devise an appropriate test. Thorn- 
pictures the method by which we teach a concept: first, we point out 
ement for examination; second, we point out the element accompanied 
irying concomitants, as, in teaching fiveness, we link “‘five boys,” “five 
ils,” etc.; third, we present the element in contrast with other concepts 
h might produce confusion, as, in teaching one-fifth, we contrast that 


Understanding may well be tested from the same psychological view- 
nt, by asking the student to recognize when the element is present, what- 
the concomitants, and to recognize when it is absent. The student who 
to obtain greatest value from the rule for the area of a parallelogram 
t know that it applies to a rectangle, a square, a rhombus, or a “dia- 
nd”, but not to a trapezoid, irregular quadrilateral, hexagon, or figure 
lying in a plane. The problem, then, simplifies to on: of presenting 
istrations of the word tested and situations which the student might 
rectly name with the word, and asking him to distinguish between them. 


W TECHNIQUE MEASURING THE DEGREE OF UNDERSTANDING OF WORDS 


A test of this sort calls for a yes-no mind set, as in the true—false 
rcise. The multiple true—false form® is particularly well adapted to the 
rements of the problem. An illustration of the form, adapted for 


ibulary testing, is: 


Thorndike, E. L. Educational Psychology, Briefer Course. (New York: Teachers 
ge, Columbia University, 1922) pp. 159-161. 

Cronbach, L. J. “Note on the multiple true-false test exercise.” Journal of Edu- 
nal Psychology, XXX (November, 1939) pp. 628-631. 








530 JOURNAL OF EDUCATIONAL RESEARCH {Vol. 36, N 


Noun: 
table 
O make 
height 
© we 
Washington 


The student responds by marking “true” each item illustrating the given 
word, and marking ‘‘false” each item to which the word does not apply 

This technique is adaptable to a wide variety of subjects. An example 
from the field of grammar has been given; further examples from chem 
istry, algebra, geography, and the social studies are given below to illustrat 
something of the possible variations. 


Element 
brass 
iron 
water 
sulfur 
fire 
oxygen 


Linear @gualion: 
1 


x +) 

y x a 

x 4 0 

x 2y z 5 

X ' 

l 

x s l 
( é fhe tent ] . 

3x 

- 

i+ 7x 

s@ 

x 
3 axy 
(c + 5 7 a 


* Directions are modified, so that the student marks the sub-item true if 
italicized portion is the “coefficient of x 





MEASURING PRECISE WORD MEANING 


Parallelogram: 


— I 


{ - 


ntinent : 








Africa 

North America 

Europe 

Europe and Asia combined 
Greenland 

the United States 


Middle class (member of) : 
a schoolteacher making $100 per month 
the owner of a small grocery 
a doctor earning around $6000 per year 
farmer earning about $700 per year and his keep on his own land 


factory laborer making $110 per month 
college student earning his own expenses 


number of illustrations presented per word depends upon the cir 
es of testing; the use of six alternatives in the above examples 
not imply that that number is uniformly desirable. In general, of 


the greater the number of items, the more satisfactory the measure 


One factor affecting the number of items needed is the number of 
sorts of confusion or error in applying the term. Thus, a group of 
ge seniors, many of them social science majors, were asked to prepare 
rations of middle class and non-middle-class persons; the items under 


, above, are drawn from these lists. The total range of meaning 


group gave to middle class could be accurately represented only by a 
or more of items; some differentiated on the basis of salary (using 


>| ry 
y different division points), some on occupational classification, some 
iount of culture, some on mental health and happiness, and some on 


inations of these. 
ADVANTAGES OF PROPOSED TEST FORM 


These differences call attention to a major advantage of the proposed 
ulary test. Many words, especially in the social studies, present a 


lary problem insuffy iently recognized in available tests: to wit, that 





JOURNAL OF EDUCATIONAL RESEARCH {[Vol. 36, N 


for many words there is no “correct’’ meaning, varied meanings being used 
by different persons. Personality, in psychology, is a good example. It js 
clear, also, that meanings may shift from time to time, as is evidenced by 
the term Western Hemisphere in recent world politics. It is important, for 
terms like this, for the teacher to recognize that different students may have 
different meanings for the term; where speaker or writer, and hearer ot 
reader are not alert to this possibility, the result is communication failure, 
and bad education. Where there is no one correct meaning, it may yet be 
important to learn just which meaning (if any) each student accepts, in 
order to identify and eliminate possible confusion. A test on the term 
middle class might well be a starting point for effective teaching to enable 
members of the group described above to discuss the topic more lucidly 
That measurement to determine what concept each person links with a word 
rather than how much of the “correct’’ concept is known, may be valuable 
is suggested by an investigation of knowledge of the word function (in the 
mathematical sense). A test of the true—false type was marked by several 
algebra teachers; results demonstrated that even for this word, chosen from 
a field far more exact that the social studies, different teachers have and 
teach meanings differing in major particulars. The pupil who has learned 
one meaning from Teacher A will find difficulty when he goes on for fur- 
ther work in the classroom of Teacher B, whose concept differs. Here again 
use of a diagnostic test for the important terms might be of great value. 


As a diagnostic test, the multiple-item form is superior to the customary 
form in which only one response per word is obtained, since, when one 
obtains five or more responses, the student’s knowledge is more reliably 
measured. Furthermore, one obtains a score for each pupil on each word on 
a scale ranging from +6 to —6 (for a six-item exercise), which makes it 
possible to discriminate just how nearly the student's knowledge approaches 


completeness. 
LIMITATIONS OF THE PROPOSED TEST FORM 


Certain limitations of the proposed approach must be recognized. It is 
not especially useful unless precision of meaning and inventory scores art 
important. There are important objectives in word learning not measured 
by the procedure suggested here. Even understanding a word completely, 
in all its senses, is not of value unless the pupil can judge from the context 


which meaning applies. The “‘availability’’ of a word—the extent to which 








Ng used 
e. It is 
nced by 
int, for 


iy have 
arer or 
failure 
yet be 
pts, in 
> term 
enable 
ucidly 
word 
luable 
in the 
Cveral 


from 


one 
iably 
d on 
Cs if 


(ils 





a 


send 
Oe. 


S 





43) MEASURING PRECISE WORD MEANING 533 


tudent uses it spontaneously in thinking—is also important. These 
ts of concept formation have been discussed elsewhere." 
The proposed device presents difficulty with words which lack precise 
e, or words which cannot be illustrated on the test blank by symbols 
liagrams. This makes it difficult to test verbs, abstract nouns (v7z., war) 
| words referring to large entities. It is possible to employ verbal descrip- 
; in some such cases, but this plan may introduce the factor of reading 
srehension, or of vocabulary difficulty in the description. 
In interpreting scores it is easy to fall into the error of assuming that 
tudent who answers five out of five sub-items correctly has a perfect 
ept. Actually, while his knowledge is probably superior to that of per 
with lower scores, it does not follow that choices could not be pre- 
ed to him which he would answer incorrectly. For most terms, one can 
orderline cases which can be classified only with difficulty as belonging, 
not belonging, to the class named. Unless such fine discrimination is 
to be needed by the students, it is probably wise to eliminate such 
trations from the test. As Lee points out, ‘Concepts develop gradually, 
the concepts involved in most words are still not complete in adult 


ILLUSTRATIVE DATA FROM USE OF TEST 


Little formal experimentation has yet been performed with tests of the 
proposed here, but the data so far obtained are suggestive. The writer 
onstructed an algebra test, dealing with thirty words of importance in 
mentary algebra. The words were selected on the basis of occurrence in 
and appearance in curriculum lists by Pressey, Schorling, and Thorn- 
Two hundred and nine students in Alameda and Oakland (California) 
schools were tested at the end of the first year in algebra; eighty of 
papers selected at random were analysed in detail. On five items test 
the term /imear equation (the first five items in the illustration above) 
ht-minus-wrong scores ranged from —3 to +-5. Only fifteen of the eighty 
lents had scores of +5. These facts, which are typical of the results 
m other words, imply, first, that there is a wide range in the precision 
th which basic terms are learned, and, second, that many such terms are 


L. J. Cronbach, 1 op. cit 
Lee, J. M., and Lee, D. M. The Child and His Curriculum. (New York: D 
eton—Century Co., 1940), p. 353 











{ JOURNAL OF EDUCATIONAL RESEARCH {Vol. 36, N 


mastered only by a small number of pupils. Insufficient evidence is avail. 
able from this study to determine the validity of the multiple true—false 
technique for testing precision of vocabulary knowledge. 


SUMMARY 


It appears important in many situations to determine how precisely a 

iderstands a word rather than whether he can pass a single-item 

st on the word. In other cases, one may wish to analyze objectively jus 

what meaning a word has for a student, without necessarily implying that 

one meaning is correct. For either of these purposes, the multiple true—false 

hnique appears to have potential advantages. Examples of the use of this 
test form in several subject fields were given. 


The ultimate development of this or any other answer to the problems 


of vocabulary testing must be carried on through the work of specialists in 
the various subject fields. It is hoped that this presentation will lead workers 


in those areas to try the technique, so that more conclusive data on its 


validity and usefulness may be obtained. 


a 


~ > 


ol 


am 


iss 


AN I 
c 
pt 
th 
) 

(D) 





AN EXPERIMENTAL STUDY OF TWO METHODS OF TEACHING 
BEGINNING READING: THE DIRECT VERSUS THE 
PREPARATORY APPROACH 


LEAH KATHRYN DICE 
Baltimore 





‘isely a 
le it ” Editor's note: Much effort has been expended in recent years upon the 
= ental studying of the teaching of reading. The author presents data 
ly just t seems to support a direct methods of wholes. 
iz od  ¥ ‘ . , 
fal THE purpose of this investigation was to study experimentally the rela- 
-—Talse a ied . , , : 
of th e effectiveness of two methods of teaching first grade reading—the Direct 
1S . . . 
ipproach versus the Preparatory Approach—as they affect achievement in 
iding, as they contribute to the establishment of permanent interests in 
blems = reading as a leisure activity, and as they yield satisfaction to the teacher 
StS in : nd to the learner. 
orkers 


he Direct Approach is a method of teaching first grade reading based 
philosophy that reading is much more than the sum-total of skills. 
in reality, an “active dynamic process’, during which the reader, urged 
a definite purpose, creates from his rich experiential background a 


Nn its 


id of fact or fancy as he follows the author’s printed directions. The three 
tors which make such reading possible are: (1) real vital experiences 
th things and people; (2) extracting and storing in memory the meaning 
these experiences; and (3) having skill and expertness in picking up the 
ters’ instructions and directions.t Methodology of the Direct Approach 


ludes the following basic factors: 


|. The initial period of general reading readiness activities is replaced 
mmediate meaningful experiences and reading from books. 

Each selection read is treated as a literary whole, with the acquisition 

iding skills subordinated to the getting and the retaining of meanings 

3. Skill and expertness in getting the author's directions is provided 

by means of skill-development activities supplementary to the reading 

rocess, and including (a) the development of a meaningful sight vocabulary ; 

(b) the independent recognition of words; (c) testing for comprehension 


Adapted from 
Bamberger, Florence E.; Reading—A Form of Living, (Evanston, Ill.: Row, Peter 
und Co 1939). 


Kerfoot, John B.; How to Read, (New York: Houghton—Mifflin Co., 1916) 


535 








536 JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, No.7 


of form, content, meaning; and (d) oral reading practice in portraying 
meanings to others. 


4. The year's instruction in reading is characterized by a series of 
wholes composed of (a) meaningful preparatory activities specific to the 
selection to be read and involving those experiences and vocabulary neces- 7 
sary for full appreciation of the selection; (b) reading of the selection as; J 
whole, followed by the reading and understanding of its parts; (c) supple Ss 
mentary skill-development activities; (d) re-reading the selection as a whole, ; 
for the retention and clarification of the acquired meanings of vocabulary 4 
and information, and for practice in portraying meanings to others. 


The Preparatory Approach is the usually accepted method of teaching 
first grade reading and is based on the philosophy that reading is a “mature J 
reading habit’’ composed of expertness in various reading skills. It is the ™ 
method currently in use in the areas in which this study was conducted J | 
Methodology of the Preparatory Approach includes the following basi 
factors: 


1. A period of general reading readiness activities at the beginning of 
the school year, designed to stimulate a desire for reading, to provide experi- : 
ences which will aid in the preparation for reading, and to introduce a basic 
reading vocabulary through informal reading procedures. 


2. A subsequent program of daily instruction in reading from books 
and supplementary reading materials. 


3. Skill-development activities are a part of the basic reading program 
and are provided for through seat work, involving vocabulary drills for 
recognition, methods of independent recognition of words, and testing for 
comprehension 


ote iNT me 





4. The year's instruction in reading is characterized by (a) the period 
of general reading readiness or pre-book program; and (b) a period of more 
or less formal reading instruction from books for the remainder of the 
school year.” 


THE DIRECT VERSUS THE PREPARATORY APPROACH 


When the two methods are contrasted, the following differences are 
noted: B 


* Adapted from: 


Courses of Study in Reading and the Manuals for the basic textbooks currently in 
use in the counties in which this experiment was conducted. 








rtraying 


supple 
whole 
abulary 


aching 
Mature 
is the 
lucted 


basi 


ng yf 
xpe rl 


basi 


i OKS 


gram 


riod 
nore 





& ynee 


4 The Direct Approach 
he book is used at once, from the 
y of school, and all reading activi- 
re associated with the book and from 
ok. Thus remembered and newly 
meanings are at once directly 

ted with reading material. 


Reading readiness is provided for 
ly and specifically, and as it is 
to set the background for the 

g of selection. It 
the proper appreciation and com- 


one involves 

yn of meanings 
The reader is first introduced to the 
a whole so that he may read 
gly and meaningfully, in the light 
experiential background and the 
vy concepts furnished. Thus the mean- 
g and recognition of the parts may 
emerge from the understanding of 

le 

Supplementary, or skill-development 
are planned to measure compre- 
appreciation in terms of 
of meaning, with the rapidity 
of recognition of vocabulary as 


1 and 


1 





The first year’s reading instruction 
series of wholes, composed of spe- 
preparatory activities; reading activi- 
‘f the whole first, then of parts; 
levelopment activities; re-reading as 
whole for appreciation and interpreta- 


TEACHING BEGINNING READING 


A. Differences Inherent in Methodology 


The Preparatory Approach 

1. The use of the book is delayed in 
favor of a period of general readiness 
activities. Although skills, meanings, and 
vocabulary are being accumulated during 
this period, they are not associated with 
books until the period of book reading 
begins. 

2. Reading readiness is general—de- 
signed to foster the skills, habits, and 
attitudes considered necessary for reading 
ability, and which will function in a gen 
eral way in all subsequent reading activity 


3. The story is frequently read in 
parts, preceded by vocabulary drills, and 
is not heard nor read as a whole until 
all the material has been presented. Thus 
experiencing the whole follows the de- 
velopment of the parts. 


4. Supplementary work is planned to 
develop increased comprehension in terms 
of sight vocabulary and rapid word 
recognition, with a secondary emphasis 
on meanings. 


5. The first year’s reading instruction 
is divided into two general periods: one 
of general reading readiness activities, and 
one of book reading. 


B. Differences in Psychological Background 


is based on the con- 
“whole” as developed in 
talt psychology. This whole is a dis- 
unique entity, over and above its 


This method 
of the 





the 





1. Since this method is concerned with 
the development of a “mature reading 
habit’, it is based on an application of 
S-R bond theory of learning, in 











38 JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, No.? M 
The Direct Approach The Preparatory Approach lev 
parts, more than the sum of its parts, and which learning to read is “primarily ¢ y 
yet includes the parts. By a process of In process of establishing connections } nri0- 
dividuation, the parts emerge from the tween the oral symbol, the visual symb oe 
total pattern, and the whole or back and the meaning’’.* Through proper ; _ 
ground determines the properties of the motivated drills, etc., the connections t 8 - 
parts. In other words, once the learner is established are strengthened, and ray a ne 3 
able to perceive the whole, each part is recognition results. Then, as famil 
then perceived in its relation to the whol and ability increase, the learner i were 
and in its relation to each of the other to recognize larger and larger sp ; ' 
parts, and takes on meanings derived from material. Continued success and yw . ° 
the whole. Thus the meaning of words, directed practice provide the motivation. Im ~ 
phrases, et come into existence through Thus, through motivated exercise, t also | 
n emergence pr s and take on mean learner responds to an ever-increasing BR book: 
ng from the meaningful background from number of small stimulus-response | S celect 
which they arise. So, from a broad field terns which he builds into a col re 
of experience a concept develops, then whole—a basic reading habit, or sk a 
the word « words which represent this or attitude, which, in turn, becomes 2 Appr 
meaning, and finally t printed symbol.’ part of the desired mature reading hab ms nared 
Skills emerge already related to and 2. Skills are definitely planned and | i 
part of the total reading process, brought vided for as part of the teaching pr a ; 
out by extens reading of | gic ally 4 
herent and inineful wholes :, 
( Diff rence I] B 157% P) il fopi ind De sired Outcome or yk 
7 Dir {pproa The Preparatory Approach 3 ec 
x 
1. TI method is designed to first 1. This method is designed to fost s of 
fford vivid, usable concepts and mean ind develop reading skills, and from tl m it ti 
ings, and to attain skill through the im successful exercise of these skills, ti 2 week 
liate ind application of thesé reader acquires concepts and meaning : 
inings to materials to be read. In other from the subject matter. In other w “ 
words, it is believed that increased ability it is believed that having read and « i 
to get and appreciate meanings through prehended (use of skill) will prod 
xte iding will lead to the develop experience and enjoyment (meaning) 
f reading skills 
In order to determine the relative effectiveness of these two methods of 
ading instruction, two groups of children were selected from ten scho 
paired on the basis of geographical location and the general socio-econom! 
‘Wheeler, R. H. and Perkins, F. H.; Principles of Mental Development. (N 
York: TI ; W. Cr Il Publishing Co., 1932), pp. 16-28; 372 82: 379 
*Stone, Clarence R.: Better Primary Reading. (St. Louis, Mo.: Webster Pub! 
o ¢ 1936). p. 158 


s 
= 
fo 





TEACHING BEGINNING READING 539 


f the community, and then equated on those factors believed to caus- 
ally affect reading progress—chronological age, mental age, reading aptitude, 
socio-economic status, kindergarten attendance, initial ability in reading. 
Cases with observable physical defects and long periods of absence were 


eve 


Proper , : 

tions ¢ from the summaries of the data. Throughout the school year, 
nd raj one group was taught by the Direct Approach (experimental group), 
mi liarit nd the other by the Preparatory Approach (control group). Teachers 


were matched on training, minimum number of years of experience in first 
grade, on supervisors’ subjective ratings, and on ratings on the Torgerson 
Scale. Supervision was held constant throughout the experiment, as were 
(1) the amount of time given to reading instruction; (2) basal text- 
oks; while choice of supplementary reading materials was confined to 


tion from a list of materials available to all classes. 


These two groups of children—the Experimental, taught by the Direct 
Approach; and the Control, taught by the Preparatory Approach—were com 
nared for (1) growth in word recognition of vocabulary specific to the basal 
extbooks used,® and in recognition of the general primary reading vocab- 

(2) for growth in ability to read and comprehend sentences;* (3) for 
ibility to comprehend and successfully attack new and unfamiliar materials ;* 
ind (4) the percentage of times throughout the year of the experiment that 
books were chosen in free periods as a leisure activity.° The tests involving 
pecific vocabulary and sentence comprehension were administered to both 
groups quarterly; tests on general vocabulary and new materials were given 
t the end of the year; the selection of books in free time was tabulated 


weekly throughout the year. 


ice, Leah Kathryn; Dice Word Recognition Test: An Experimental Study of 
{ 


“Be A Dice, 
s University, June, 1941) 


gz) : 7 : , 
; Gates, Arthur I.; Gates Primary Reading Test; Type 1: Word Recognition; 
York: Bureau of Publications; Teachers College, Columbia University, 1927) 


ds of Beginning Reading; Ed. D. Dissertation; (Baltimore, Md.: Johns 


Gates, Arthur I.; Gates Primary Reading Test; Type 2: Sentence Reading 


ods of 
hool vw York: Bureau of Publications, Teachers College, Columbia University, 1927) 
WoO 

* Dice, Leah Kathryn; Paragraph Comprehension Test; An Experimental Study of 
noms | Two Methods of Teaching Beginning Reading; Ed. D. Dissertation (Baltimore, Md 


Hopkins University, June, 1941) 

Compiled from 
Data sheets prepared by teachers in experiment showing type or types of activity 
n by children in free periods 











540 JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, N 


Results were computed for total groups—experimental and control—and 
for three ability sections (low, average, and high) within each total group 
The results of the quarterly tests of vocabulary recognition indicate that 


1. When the experimental and control groups are considered as wholes, 
differences between means, with the exception of the results for the second 
quarter, consistently favored the experimental group. While none of the 
differences was statistically significant, they increased as the year progressed, 
with the greatest differences occurring at the end of the year. 

2. Differences for the /ow ability sections were in favor of the experi- 
mental group at the end of each quarter, and were statistically significant at 
the end of the year. 

3. Differences for the average ability sections favored the control group 
for each quarter, but none of the differences was statistically significant. 

4. Differences for the high ability sections were negligible, favoring the 
experimental group for part of the year and the control for part, with none 
of the differences significant. 

Results of the final test of recognition of general primary reading 
vocabulary confirmed the findings for the tests of vocabulary specific to the 
texts used, since differences for total groups and ability sections, while not 
statistically significant, consistently favored the experimental group. 

The results of the quarterly tests of sentence comprehension indicate that 


1. When the experimental and control groups are considered as wholes, 
differences were negligible for the first quarter but increased in size as the 
year progressed and consistently favored the experimental group. Differences 
for the second, third, and fourth quarters exceed three times the probable 
errors of the differences and therefore approach statistical significance. 

2. Differences for the /ow ability sections favored the experimenta 
group throughout the year, with statistically significant differences at the third 
and fourth quarters, and a difference approaching significance for the 
second quarter. 

3. Differences for the average ability sections favored the control group 
for the first three quarters of the year, the difference for the third quarter 
being statistically significant. The difference at the end of the year favored 
the experimental group, with a difference approaching significance. 

4. Differences for the high ability sections favored the control group 
for the first quarter and the experimental group for the remainder of th 
year, the difference for the second quarter being of statistical significance 


Results of the New Material Test, designed to measure ability to suc 
} 


cessfully attack new and unfamiliar reading material, and given at the end 
of the experiment, indicate that differences for the groups as wholes and for 





M 


the 








TEACHING BEGINNING READING 541 


{eTHOD FAVORED BY THE DIFFERENCE BETWEEN MEANS AND THE SIGNIFICANCI 
[THESE DIFFERENCES, AS SHOWN BY THE CRITICAL RATIO, For EACH 
MEASURE OF ACHIEVEMENT AND For EACH GROUP STUDIED 















































| 
| First Second Third Fourth 
Quarter Quarter Quarter Quarter Final 
—— = soci oe ccm = uate — — 
gi» ¢ be = be - be 
= £& s i} 3 £ = € - 
21% e1e/2)3 || ¥ re 
be $ be S | be Fy be ® rs rs 
> ead = . | ) ad S 5 - ° q 
gi/e)}¢e]2]8]18]8 | & lee] = 
o|8isis lsil Sil si § | efi 5 
5 = 5 = 5 s 5 = S 8 3 
2 | = S é 2 - Bye § #3 % 
Ela |eFial!|e® | a| Fe] a | om] 2 
d |p > Die | Site tea d d 
ips - - 
73 38 | 1.12 | 3.06 | 1.12 | 3.55 | 1.78 | 3.51 | 1.65 | 1.54 
| - . 
d d eS 15 a hee - a © © 
se ons - - — | | 
2.50 | .67 | 2.36 | 3.46 | 2.55 | 4.72 | 4.28 | 5.22 | 2.07 89 
p p P p P | Pp p | d d D 
y sections | 
| 1. 76 55 12.10] .42 86 | 4.07 | 1.11 | 2.72 11 | 3.62 
| d p p D d D | p d | d d 
y sections | r= | —— 
23 . 67 | 1. 80 | 3. 78 | 29 3. 09 i . 13 2. 48 | 1. 50 1. 58 


t Approach significantly (critical ratio of 3 or more) superior to the Preparatory Approach 
ect Approach not significantly superior to the Preparatory Approach. 
reparatory Approach significantly (critical ratio of 3 or more) superior to the Direct Approach. 
reparatory Approach not significantly superior to the Direct Approach. 


average ability and high ability sections favored those taught by the ex- 
imental method, with the difference for the low ability sections so small 
lay be said to be negligible, being less than the probable error of the 


trerence. 


When all measures of achievement are thus combined, it may be 
that: 
1. Of the 40 differences, 27 favored the experimental group taught by 


Direct Approach, with 10 of these differences statistically significant 
rteen favored the control group, with one difference of statistical 


enificance. 


2. Eight of 10 differences for the total groups favored the Direct Ap 
ich, with 3 of these differences statistically significant. 


3. Nine of 10 differences for the /ow ability sections favored the experi- 


ental group (Direct Approach), with 4 of these differences statistically 
gnificant. 





542 JOURNAL OF EDUCATIONAL RESEARCH {Vol. 36, N 


it. Seven of 10 differences for the average ability sections favored the 
Preparatory Approach, with one of these differences statistically significant 
Of the 3 differences favoring the experimental or Direct Approac h, one w 
st atistically significant 


“ 


5. Seven of 10 differences, two of which were statistically significant 
for the Aigh ability sections favored the Direct Approach; while of the 
differences favoring the Preparatory Approach, none was statistically sig 
nificant 


Results of the summaries of the percent of times during the year chi 
dren chose to read books in free periods (arranged as well within total 
groups and ability sections on the basis of range of achievement in sentence 


reading) indicate that 


1. When the experimental and control groups are considered as wholes 
the children within the experimental group, regardless of level of achieve 
ment, chose books in free periods an appreciably higher percent of times 
than did the children in the control group, with the amount of difference 
increasing as the level of achievement increased. 

2. Differences in the /ow ability sections favored the experimental group 
it each level of achievement, with the amount of difference increasing as 
the achievement level increased. 

3. Differences for the average ability sections favored the experimental 
group for the second, third, and fourth achievement levels, but favored the 
control group at the first level. The percent of times books were used was 
again related to achievement, since the difference increasd as achievement 
increased. 


[HE METHOD FAVORED BY THE PERCENT OF DIFFERENCE AND THE CHANCES IN 
THAT TRUE DIFFERENCES ARE GREATER THAN ZERO, For EACH LEVEI 
IN THE RANGE OF ACHIEVEMENT AND For EACH GROUP STUDIED 


i 
| 


Range of achievement | First level Second level | Third level Fourth level 


| | | 
| Method | Chances | Method | Chances Method | Chances Method | Chances 


| favored in 100 | favored | in 100 | favored in 100 | favored | inl 
Total group | D.A 723~CO«|s«é@S AA. 9. 6] «CDA. 98 D. A. | 94 
Low al sections D.A 79 D.A 93 D.A 90 
Average al sect s| P.A 71 D.A 83 D.A 8 D.A 
Higha s 1s D.A ° D. A. 8 D.A 89 D.A 8 
Code 


D. A.—Direct Approach 
A Preparatory Approach 
*—could not be determined, since books were not used by children in the control group of this al 
section at all during the free periods 


he ns 








ratory Approa h 





543 


TEACHING BEGINNING READING 


Differences for the high ability sections consistently favored the ex- 


p, and again, with one exception (second level of achieve 


grou} " 


of achievement increased 


) 


difterenc« noted increased as the lev 


n the data are thus combined, it may be seen that: 


With only one exception, children within the experimental group 


y the Direct Approach, regardless of level of achievement, read more 


tly in free time than children in the control group taught by the 


children taught by the Direct Approach, regard 


WX hout exception, 
quently than children taught by 


level of mental ability, read more fre 


ratory Approa h 


both from the direction of the 
id from the number of chances in 100, that the differences were 


ince, and. therefore it may be assumed that the variable—the 


which the groups were taught—influenced the 


the Direct Approach, as shown by the percent of times children 


read books in free periods. 


iddition to the objective data summarized above, statements by teach 


results definitely 


rvisors regarding the effectiveness of the two methods as they 
) the satisfaction of accomplishment, in both teachers and chil 


that 


int of work involved was about the same, the ex 


by the Direct method more than they 


Although the amo 
teachers enjoyed teaching by 


1 


ng by the Preparatory Approach, the method they had formerly 
Children seemed happier and appeared to develop reading skills 


rally 


when taught by the Direct Approach 
The question of ‘‘when-do-we-get-a-book 


} | 
was immediately settled in 


Approach, and this, in itself, seemed to create a better feeling 
nts, as well as children and teachers 
“Pes - ' ;, 
County supervisors impressed by the favorable attitudes toward 


within the experimental groups, recommended use of the Direct 
h for the following year, without waiting for a report of the final 


in achievement 
n the data thus presented regarding achievement, use of books, and 


s toward method, the following conclusions may be drawn: 


: 


' 
, 


it, and on the basis of meas 


Under the conditions of this experimen 
Approach) and the 


chievement for the experimental group (Direc 


group (Preparatory Approach) as wholes, 


544 JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, N 


1. The Direct Approach was slightly superior to the Preparatory Ap 
proach in producing skills of word recognition. When one recalls that the 
Preparatory Approach, and not the Direct Approach was designed to foster 
word recognition, the slight difference takes on added significance. 

2. The Direct Approach was significantly superior in developing the 
ability to comprehend and interpret sentences. 

3. Children taught by the Direct Approach were more successful in 
attacking new and unfamiliar materials than were children taught by the 
Preparatory Approach. 


Regardless of the measure of achievement used, both by direction of dif 
ferences and by the number of statistically significant differences, the Direct 
Approach as a method of teaching first grade reading was superior to the 
Preparatory Approach. 

II. Under the conditions of this experiment, and on the basis of meas 
ures of achievement for the ability sections (low, average, and high) within 
the experimental and control groups, 

1. In the low ability sections, the Direct Approach was significantly 
superior to the Preparatory Approach in developing achievement in both 
word recognition and sentence reading, with the ability to attack mew mate 


rials practically the same for both groups. 


2. In the average ability sections, the Preparatory Approach was slightly 


superior throughout much of the year in developing achievement in word 


recognition and sentence reading. However, the difference in attack on new 
materials significantly favored the experimental group. 

3. In the Argh ability sections, one method seemed about as effective as 
the other with slight weight in favor of the Direct Approach when the num 
ber of differences was considered. 

While children of high and average ability progressed about equally well 
regardless of method, the children of low ability progressed significantly 
better under the Direct Approach. 

III. Throughout the study, in the experimental and control groups as 
wholes and in all ability sections, the standard deviations of the various meas 
ures of achievement indicated less variation in the experimental then in the 
control group. This evidence supports the data regarding achievement in the 
low ability section, i.e., the statistically significant gains made by this group 
are emphasized by the decrease in variation of the total groups. 

IV. Under the conditions of this experiment, and on the basis of the 
percent of times during the school year children chose to read books as 4 
leisure activity in free periods, it is seen that: 











» No. 7  \ 3] TEACHING BEGINNING READING 545 


ae | 


'y Ap Regardless of level of ability, children taught by the Direct Ap- 
tat the proach read much more frequently than did children taught by the Prepara- 
toster tory Approach. 


2. On the basis of achievement, as well as ability, the Direct Approach 
s decidedly superior, with one exception, in fostering the desire to read in 


ig the 


< 
Ss 


: leisure time. 

ful in 

yy the Regardless of ability and achievement, the Direct Approach was decidedly 
iperior to the Preparatory Approach in fostering “desires for and interest 

f dif n reading” as shown by the actual use of books in free periods. 

— V. Teachers felt that they taught more successfully and that children 

vie were happier when instruction in reading followed the Direct Approach 
thodology. While this is purely a subjective judgment, it nevertheless 

ves | deserves consideration, for certainly better results may be expected when in- 

‘ItThin 


nd emotional satisfaction are achieved. 


ails VI. Since the Direct Approach accomplishes the desired word recogni- 
both increased comprehension of material read, increased ease and success 
nate in attacking new materials, and contributes appreciably to the development 
| f interests in and use of reading as a leisure activity, it is to be hoped that 
ght} this method of teaching first grade reading will receive further attention and 
pss ncreased use. Experimentation with the Direct Approach should probably 
| t stop at the end of the first school year, but extend over at least the first 
three years of school life, during which reading instruction occupies the 
major portion of the program. Thus attempts may be made to determine if 
pains made in one year continue into the following years, and especially if 
well ains in independence continue, so that the teacher may step more quickly 


ntly nto the background, at least as far as reading instruction is concerned. 
Since results for the low ability sections so outstandingly favored the 
5 as Direct Approach, those responsible for programs for low ability groups 
eas should make further intensive study of this method with children at various 
the evels of chronological and mental age and with much more diversified ma- 
the terials of instruction. Since children in this group usually spend a long pe- 
up riod of time in school before any degree of reading success is achieved, the 
vidence herein presented should stimulate further study to determine to 
the what extent this long period of “waiting for readiness” can be reduced. This 
$a ms important from a mental hygiene point of view as well as from an 
lucational one. 








v% 




















EDITORIAL | 























MISUSES OF THE FISHER STATISTICS 


PROFESSOR R. A. FISHER wrote chiefly for agricultural research. The 
key to his statistics is to be found in the following passage from his Statistica 
Methods (Seventh edition, page 293): “With annual agricultural crops, to 
crop the experimental area in the previous year is nearly to double the labour 
of the experiment. What is often more serious, a year’s delay is incurred 
before the result is made available . . . It seems therefore to be always more 
profitable to lay down an adequately replicated experiment on untried land 
than to expend the time and labour available in exploring the irregularities 
of its fertility.” In consequence, practically all of his procedures are for 
random groups rather than for the equated groups with which we are 
familiar in controlled experimentation in education. The “‘s’” test as custo- 
marily understood and applied is identically the test of significance of the 
difference between means by the formula,’ 


—_ 


/ 2 

=u i meee o Cd. 

2 2 1 2 
Om — ma = Vom? + Oma? = 4/ + 
oe 2 N, ' N, 


when a common variance is estimated by taking a weighted average from 
the two groups. The formula for ¢ is, of course, put into a shape that looks 
to the novice quite unlike this one; but the complicated formula given by 
Fisher reduces algebraically to the above. Analysis of variance with one 
degree of freedom, which is now being employed considerably in educational 
research, also reduces to identically the same form: it is merely a compar 
ison of the means of two random groups where the statistical significance of 
the difference is tested by the standard error formula for random groups 
given above, though put in terminology so unfamiliar that the uninitiated 
reader would never suspect that fact. The setting in which both of these 
procedures are used in researches which this writer has met in the literature, 
and the accompanying comments, show that the workers are often unaware 


*See Fisher, R. A. Statistical Methods for Research Workers, 7th ed. p. 129. 
546 


4 
a 


stanadal 


nincan 


chance 


ences 
throw 
to the 
(avon 


the fr 











~<a, 
| 
| 
—) 


The 
tistical 
Ps, to 
labour 
Curred 
3 More 
1 land 
arities 
re for 
fe are 
custo- 


of the 


from 
looks 
en by 
» ome 
tional 
mpar- 
ce of 
roups 
tiated 
these 
ature, 
ware 





5) 


tee ewe 


re = st - 


943) EDITORIAL 547 


of the assumptions upon which their procedures are based; and both the lack 
of data displayed and unfamiliarity with the terminology prevent the reader 
from passing critical judgment upon the presentation. Here is an example 
om a recent write-up. A row in a table by the analysis of variance technique 
degrees of freedom, 1; (between) variance, 1.36; value of effect, 
il. error variance, 3.251; unstarred, meaning below the five per cent level. 
A corresponding row in conventional statistics would read: mean for experi- 
mental group, 32.00; mean for control group, 32.41; difference between 
ns. —.41: standard error of the difference, .64; ratio of difference to its 
tandard error (¢), .64; corresponding value at the five per cent level of sig- 
nificance, 2.04; probability that so great a difference. would occur merely by 
hance fluctuation if the true difference were zero, .53. The computations for 
this second way of putting it are no more laborious than for the first. The 
first way of putting it has no advantages whatever over the second—is in 
t identically the same in meaning and in outcome—and has the disad- 
antage that it would be understood by only a small percentage of readers 
esent. Even if the first achieved familiarity, the second way of stating 
facts and interpretation would be intrinsically better because more 
thtforward and meaningful. 
But the disciples of the Fisher school who think that even the Fisher 
ystem restricts them to these random group comparisons are mistaken. To 
ure, these are the ones he stresses, for the reason indicated in the quota- 
But in a section which most of these workers seem to have overlooked 
(Statistical Methods, 7th ed., page 127) he treats the case of the mean of 
differences between two sets of paired scores, calling the ratio of this 
in to its standard error ¢. This also has “Student's” distribution. In fact 
it was “Student's” original z. But it can easily be shown (See Peters and 
VanVoorhis, Statistical Procedures, McGraw-Hill ed., p. 164) that the stand- 


rd deviation of a set of paired differences divided by \/N — 1 gives 





Omy — m2 = V Om” + Oma” — 2012 Omy Ome 
Chis is the long-known correct formula for the standard error of the differ- 
nces between means of matched groups. Thus, to believe that we must 
row away the precision that comes from matching groups and must return 
to the random-group method which we discarded in educational experimen- 
tation 30 years ago in order to avail ourselves of an “exact” distribution for 
the ¢ ratio, is to misunderstand even the Fisher statistics. 


4 


e. 


548 JOURNAL OF EDUCATIONAL RESEARCH {Vol. 36, N 


It is very unfortunate that the Fisher statistics has been written up in a 
lingo that makes it seem quite new and very erudite, but with no effort to 
show how its processes lead back to well-known concepts in classical statistics 
This has brought it about that workers merely push buttons instead of under. 
standing, and critically reacting upon, what they are doing. The newness 
consists either in new names for old concepts or in alternate ways of doing 
fundamentally the same things, which alternate ways run as feasibilities al! 
through mathematical procedures. For example, take the general case of 
analysis of variance, which is certainly one of the supposedly new Fisher 
techniques. This is not really an analysis of variance at all, since the several 
mean-squares do not add up to any predetermined initial value; it is, instead, 
the sum of squares that is analyzed, which is a very different matter. Then 
a standard deviation is estimated, in the customary manner (though much 
obscured by the peculiar directions given for its computation), from each 
sum of squares that interests us, and we compare with each other, to se 
whether they differ significantly, any two of these that our purpose requires 
The only departure from familiar procedure is that Fisher makes the com- 
parison by taking the logarithm of the one divided by the other and has 
tabled the distribution of these logs (z); when it would have been just as 
feasible to take their difference by subtraction and table these differences 
divided by their standard errors. If the research worker could see that he 
is doing the very simple and prosaic job of comparing two standard devia- 
tions, and that he should compare only those which his clearly conscious 
research purposes require him to compare, his procedure would be intelligent 
and interpretable, both for himself and for his readers. When he is playing 
with magic, merely following mechanical directions, he is likely to do all 
sorts of foolish things. 


This showing could be paralleled by many others. In fact, except for 
the distribution of the coefficient of correlation for small samples, there is 
not a single thing that is fundamentally new in the Fisher system, and nothing 
that could not be reduced to the concepts with which those who have studied 
classical statistics are familiar. For every one of the basic Fisher techniques, 
except the distribution of r as indicated above, there is feasible a parallel in 
classical statistics that is more precise, richer because more constructive in 
meaning, and just as “exact’’ in the mathematical sense. (“Student’s” dis- 
tribution, of course, is not a Fisher contribution; it was published by William 





researc 
These 
of lons 
man, | 
import 
statisti 
statisti 
B 


} 


ones i 
equal 
and |i 

vO 
ire pa 
is sim 


frank] 


hntness 
helds 
these 
other 
psych 
at the 
cholo 


suffer 








P ina 
fort to 
tistics 
inder. 
wness 
doing 
es all 
se of 
“isher 
veral 
stead 
Then 
much 
each 
) see 
Tes 
com 

has 





1943) EDITORIAL 549 


Sealy Gosset under the encouragement of Karl Pearson when Fisher was a 
boy only 18 years of age) 

All this is, of course, not to say that R. A. Fisher never did anything 
research would work into some further ramifications not hitherto explored. 
These should have been avowedly linked up as details with the central stem 
of long-developing statistics, as corresponding developments by Soper, Spear- 
man, Isserlis, and a host of others were. These relatively detailed, though 
important, developments did not warrant the idea that the foundations of 

istics had been overturned, or the wide-spread notion that there is “a new 
tistics”. 

Basically the applied Fisher techniques are marked off from the classical 

ies by the rough assumptions that are made of independence of classes, 
equality of n’s in the classes, normality of distribution of the variates, etc., 
and limitation to the special rather than the general case. I know that the 
ery Opposite impression prevails, supported by the fact that mathematicians 
ire partisans of this system. Once making the assumptions, the mathematics 
; simple and exact and fascinatingly beautiful; and mathematicians will 
frankly say that it is our concern as researchers, not theirs, whether the 
sssumptions are legitimate in the particular research situations with which 
we work. It happens that in most of the research in our field the assump- 
tions are so far-fetched as to abort the results for careful work. 

For the field for which these techniques were developed—mostly agri- 


fitness of some of the designs of the experiments to the spacial outlay in 
fields of irregular fertility even reaches the level of genius. And occasionally 
these techniques will be useful for rough preliminary exploratory research in 
other fields, including psychology and education. But if educationists and 
psychologists, out of some sort of inferiority complex, grab indiscriminately 
at them and employ them where they are unsuitable, education and psy- 
hology will suffer another slump in prestige such as they have often hitherto 
ffered in consequence of the pursuit of fads. 
CHARLES C. PETERS. 


Editor's note: This the first of a series of critical comments on research and 
stical procedures. 




















Research Abstracts and Bibliographies | 


Address all communications relative to research abstracts and bibliog- 








: 
raphies to A. S. Barr, University of Wisconsin, Madison, Wisconsin. | 
| 














BULLETINS 


BOHMAN, ESTHER L. and DILLION, 
JosePpHINe. The Librarian and the 
Teacher of Music. Chicago: American 
Library Association, 1942. 55 pp. 


Discusses the library and the music 
curriculum, music in experience, library 
skills, attitudes, etc. Bibliography. 


CUNLIFFE, REx B. and others. Guidance 
Practice in New Jersey: A progress 
report. New Brunswick, New Jersey: 
Rutgers University Studies in Educa- 
tion, 1942. 147 pp. 


Discusses trends, best practices and 
programs in action 


DALE, EDGAR and Spicer, VERNA. 
Newspaper Discrimination: An anno- 
tated bibliography. Columbus, Ohio 
Bureau of Educational Research, Ohio 
State University, 1942. 27 pp. 


Furnishes notes with references to 
some 45 references 


EVEDEN, Epwarp S. Teacher Education 
in a Democracy at War. Washington, 
D. C.: American Council on Educa- 
tion, 1942 


Discusses implications of the war for 
teacher education, lessons from war of 
1917-18, post war trends, English ex- 
periences, and a proposal program 


550 

















oe 


Eaps, Laura K. Check List for Revieu 
ing a Reading Curriculum. New York 
City: Division of Curriculum Researc! 
Board of Education, 1942. 22 pp 


Consists of statements of principle 
considered important in studying and 
appraising various aspects of a reading 
curriculum 


GATES, ARTHUR I. and PRITCHARD 
MiriAM C. Teaching Reading to Sl 
Learning Pupils: A report on an ex- 
periment in New York City Public 
School 500 (Speyer School). New 
York City: Bureau of Publications 
Teachers College, Columbia Univer 
sity, 1942. 65 pp. 


Discusses reading program in Publi 
School 500, developments of reading 
abilities in Public School 500 and co 
parison with other schools. 


GILBeR, LUTHER C. and GILBER, Dor 
Witcox. Training for Speed and 
Accuracy of Visual Perception 
Learning to Spell: A study of eye 
movements. Vol. VII, No. 5. Berkeley 
and Los Angeles: University of Cali 


fornia Publications in Education, 194 
pp. 351-426 


The authors conclude that judici 
training can, without rushing the pup! 
effect an increase in rate and efficieno 
and can improve perceptual habits 














{3} RESEARCH ABSTRACTS AND BIBLIOGRAPHIES 551 


WILLARD E. and others. State 


§ | Finance Systems. National Ed- 

tion Association Research Bulletin, 
XX. Washington, D. C.: National 
| ition Association, November, 1942. 


Discusses sources of school revenues, 
ment and trends. 


Louis P. and others. Services of 

\rthopedically Handicapped: A 

rt of a study made under the 

es of the trustees of the Widener 

Memorial School for Crippled Chil- 

n and the Board of Public Educa- 

n. Philadelphia: School District of 
adelphia, 1942. 115 pp. 


Discusses problems of admission and 
issal, the program, medical service, 
logical service, guidance, transpor- 
n, feeding, etc. Bibliography. 


SON, A. P. “The Prediction of 
Scholastic Achievement for Freshman 
gineering Students at Purdue Uni- 
sity,” Studies in Higher Education, 
[V. Lafayette, Indiana: Division of 
cational Reference, Purdue Univer- 
1942. 24 pp. 


author finds that a combination 
sures including the Iowa Mathe- 
Training Test, Cooperative Inter- 
ite Algebra (Form P), tenth of 
hool graduating class, and Thur- 
V" factor, has the highest predic- 


lue 


AUVER, GRAYSON N. and others. 
Education for War and Peace. Stan- 
rd University Press, 1942. 39 pp 


Discusses education essential to vic- 


and to peace with a program of 


MorRISON, JOHN, and others. The Early 
Development of Number Concepts. 
University of London Press, St. Hugh's 
School, Bickley, Kent, 1942. 63 pp. 


Discusses the arithmetical background 
of children entering school, number situ 
ations, number games, and number pic- 
tures. Bibliography. 


MosieR, CHARLES I. Evaluating Rural 
Housing: The development of the 
Florida Housing Inventory and the 
Index of Housing Adequacy. Gaines 
ville, Florida: Curriculum Laboratory, 
College of Education, University of 
Florida and State Department of 
Education, 1942. 88 pp. 


Describes Housing Index with instruc- 
tions to raters 


PoTTHOFF, Epwarp F. “The Combina- 
tions of Subjects of Specialization for 
High School Teachers of Foreign Lan- 
guages,” Studies in Higher Education, 
No. 3. Urbana, Illinois: University of 
Illinois, December, 1942. 39 pp 


Discusses the conditions relative to 
supply and demand, and the qualifica- 
tions of prospective teachers of foreign 
languages. 


REAVIS, WILLIAM ( (Editor). The 
School and the Urban Community 
Proceedings of the Eleventh Annual 
Conference of Administrative Offices 
of public and private schools. Chicago: 
University of Chicago Press, 1942 
243 Pp 
Contains 15 papers on the nature of 

school and community relations; the 


utilization of community resources; edu 








e% 





552 JOURNAL OF EDUCATIONAL RESEARCH {Vol. 36, N 


cation and the improvement of commu- 
nity life; school personnel and commu- 
nity life; and community study and edu- 


cational progress. 


Rew, SEERLEY. Radio in the Schools of 
Ohio. Washington, D. C.: Federal 
Security Agency, 1942. 34 pp. 
Discusses radio and sound equipment, 

classroom use of school broadcasts, other 

uses of radio in school, and recommenda- 


tions 


REINHARDT, EMMA, and BEN, FRANK A 
Changes in the Student Body of the 
Eastern Illinois State Teachers College 
During the Fifteen-Y ear Period, 1925- 
26 to 1940-41. Eastern Illinois State 
Teachers College Bulletin, No. 159. 
Charleston, Illinois: Eastern Illinois 
State Teachers College, July, 1942. 
62 pp. 

Discusses the general characteristics; 
geographic distribution; family, voca- 
tional, and cultural background; elemen- 
tary and secondary school experiences; 
mental ability and college achievement; 
and other characteristics of the group. 


REMMERS, H. H., THOMPSON, W. R., 
and Vaurio, A. E. “The Effect of 
Participation in Extra-Curricular Music 
upon Scholastic Achievement,” Studies 
in Higher Education, XLVI. Lafayette, 
Indiana: Division of Educational Ref- 
erence, Purdue University, July, 1942. 
pp. >-10 


The authors conclude that extra- 
curricular pupils do as well or better 


than non extra-curricular pupils 


REYMERT, MARTIN L. and others 
Eleventh Annual Conference on De- 
linquency Prevention. Department o 
Public Welfare, State of Illinois, 194? 
222 pp. 

Presents a series of papers on the sub 
ject of delinquency prevention. 


RUFSVOLD, MARGARET I. World Wa 
Information. An annotated list of cur 
rent books and pamphlets for teachers 
students, and adult discussion groups 
Bulletin of School of Education 
Bloomington, Indiana: Bureau of Co 
operative Research and Field Service 
Indiana University, 1942. 128 pp 


A classified annétated bibliography 
world war information 


RUSSELL, JOHN DALE. Terminal Educa 
tion in Higher Institutions: Wit 
special reference to the readjustment of 
higher education to meet current na 
tional needs. Proceedings of the Insti 
tute for Administrative Officers of 
Higher Institutions. Chicago: Univer 
sity of Chicago Press, 1942. 198 pp 


Presents a series of 18 papers on vari 
ous aspects of terminal education in 
higher education. 


SmirH, HENRY LESTER, and EATON 
MERRILL THOMAS. An Anal)ysi 
Arithmetic Textbooks: Second period 
1821 to 1850, and third period, 185! 
to 1880. Bulletin of the School of Ed 
ucation, XVIII. Bloomington, Indiana 
Bureau of Cooperative Research and 
Field Service, Indiana University 
November, 1942. 112 pp. 


Discusses space given to various topics 
and type of content and related topics 








rk City 


ess administration, the 


RESEARCH ABSTRACT. 


43% 


syeR. GEORGE D. and others. The 


of a Survey of the Public 
of Newark, New Jersey. New 
Bureau of Publications, 
chers College, Columbia Univer- 
1942 

usses administrative organization, 


building 


, personnel, education at various 
exception education, guidance and 
rovement practices. 


S$ AND BIBLIOGRAPHIES 553 


Younc, Iona. “A Preliminary Survey 
of Interests and Preferences of Primary 
Children in Motion Pictures, Comic 
Strips, and Radio Programs as Related 
to Grade, Sex, and Intelligence Differ- 
ences,” Bulletin of Information, XXII. 
Emporia, Kansas: Kansas State Teach- 
ers College of Emporia, September, 
1942. 39 pp. 

The author presents a series of recom- 
mendations based upon the findings. 


o% 


























Research News and Communications | 


Address all research news and communications to Carter V. Good, 
Teachers College, University of Cincinnati, Cincinnati, Ohio. 














———— 


History of the P. E. A.—Berdine J. 
Bovard of New Mexico College of Agri- 
culture and Mechanic Arts completed 
in December, 1941, at the University of 
California (Berkeley) a doctorate dis- 
sertation, entitled “A History of the 
Progressive Education Association, 1919- 
1939."" The Progressive Education Asso- 
ciation was organized April 4, 1919, un- 
der the leadership of Stanwood Cobb, 
Eugene Randolph Smith, Mrs. Laura C. 
Williams, and others. The Association's 
purpose in its early years was to dis- 
seminate progressive ideology and to 
disclose the results of educational ex- 
perimentalism. In recent years, the at- 
tention of the Assuciation has shifted 
from methodology and the school en- 
vironment to a reinterpretation of de- 
mocracy that would include equality of 
economic opportunity. 

From 1919-1923 the Association's ac- 
tivities were confined to annual confer- 
ences and publication of bulletins at 
irregular intervals. After 1924 the 
Progressives became more articulate 
through publication of a journal, 
Progressive Education. 

In 1930 the Progressive Education As 
sociation inaugurated a program of re- 
search, launching its efforts by appoint- 
ing a Commission on the Relation of 
School and College (Wilford M. Aikin, 
chairman). This Commission undertook 
to determine whether students who had 





554 





IL 








not pursued secondary-school subjects 
ordinarily required for college entrance 
would be handicapped in college work 
Thirty experimental secondary schools 
and two hundred colleges participated 
in the study. A recent report indicates 
that college students from the experi 
mental schools maintained slightly 
higher grade-point averages than ar 
equated group from traditional schools 
Although the difference in grade-point 
averages was not statistically significant 
students from the thirty codperating 
schools did display a wider range of 
interests and undertook more researc! 
independently. 

An outgrowth of the 1930 Commis 
sion was the Commission on the Sec 
ondary-School Curriculum, created in 
May, 1933, whose purpose was to dis 
cover what should constitute the sé 
ondary-school curriculum. This C 
mission recommended that the follow 
ing needs of pupils should form t! 
basis for formulating curricula: (1) px 
sonal-social relationships, (2) soci 
civic relationships, (3) economic rela 
tionships, and (4) personal living 

Other commissions that have under 
taken research for the Association are 
the Commission on Human Relations 
and a joint commission with the N: 
tional Education Association on R 
sources and Education. Certain commit 
tees, e.g. the Committee on Home 














1943} RESEARCH NEWS AND COMMUNICATIONS 555 


S | Relationships and the Commit- 
Socio-Economic Problems have 

le contributions. 
e commissions and committees of 
Association have been mainly sub- 
lized by the General Education Board. 
More than a million and a half dollars 
been subscribed by this foundation 
research program of the Asso- 


[he Progressive Education Associa- 
inaugurated in 1929 a program of 
titutes as in-service training for teach- 
Later these institutes were expanded 
metamorphosed into the Summer 
Workshops. The Workshops offer sec- 
iry-school teachers assistance with 
specific problems under the guid- 
of experts. 
In 1932 the Association merged with 
New Education Fellowship, which 


represents progressive education in 


Europe. Thus, the Association became 
national in scope. 


Teachers Duty in Wartime.— 

Ralph H. Ojemann of Iowa State Univer- 
and a group of associates have pre- 

a pamphlet of sixteen pages, en- 

The Teacher's Responsibility in 

War. This booklet was written 

quaint teachers with the mental- 
giene problems of children as created 
war and to suggest procedures by 

h teachers can help the child to 
such difficulties. In time of war, 
nation can afford to let its children 
fer physically or emotionally. The 


ture will need men and women with 
g personalities. When viewed in 
light, teachers have a most signifi- 
responsibility. 
bulletin considers a variety of 
litions that may affect the child men- 


tally and emotionally. The absence of a 
beloved brother or father, called from 
home for military duty or industrial 
service elsewhere, restriction of living 
conditions and play space in crowded 
war-plant areas, repeated emphasis on 
war in radio and newspaper and motion- 
picture experiences, and departure of 
the mother from the home to the fac- 
tories—all may have disturbing effects 
on the child's growth. 

To protect the child from fear, inse- 
curities, and other forms of mental 
strain produced by such changes in the 
child's environment, seven concrete sug- 
gestions are given, based on fundamen- 
tal research in child development, but 
couched in simple language and in such 
terms that they can be applied in prac- 
tically every community. The recom- 
mendations relate to: methods for learn- 
ing about the conditions in the indi- 
vidual community where the teacher is 
working, studying each child as an in- 
dividual, helping the child to find clear 
and simple answers to his questions, ap- 
praising war activities for children in 
terms of the fundamental effects on de- 
velopment, and conducting class discus- 
sion in ways conducive to mental health 
The last suggestion relates to the 
teacher's own mental health. 

The emphasis throughout the analy- 
sis is upon building strong and healthy 
personalities that are not twisted by 
emotional strain. 


N. E. A. Research Division—The 
N. E. A. Research Division was estab- 
lished to provide the officers of the 
Association with necessary information 
and to anticipate the needs of members 
of the Association with respect to pro- 
fessional problems. In ordinary times 


v% 





556 JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, N 


the Division has focused its attention 
both upon technical professional prob- 
lems (e¢.g., administration and instruc- 
tion) and upon welfare questions (¢.g., 
retirement and salaries). In wartime the 
responsibilities of the Division have been 
complicated, because all issues have ac- 
quired wartime aspects which color and 
sometimes neutralize the facts of long- 
time significance. As a result, much of 
the effort of the Division has been given 
to interpretations of the rules, trends, 
and laws resulting from emergency con- 
ditions 

Special Studies—The Division is now 
completing its biennial salary survey of 
city-school systems for the school year, 
1942-43. The work has been complicated 
by so-called bonus payments and special 
increases. A preliminary report will be 
issued as the February number of the 
Research Bulletin and detailed reports 
will be issued from time to time up until 
the fall of 1943. 


A report of the effect of a year of war 
upon the schools is planned for the April 
issue of the Research Bulletin. The Octo- 
ber number will deal with the education 
of the slow-learning pupil and will be 
similar to the report on methods used 
with superior students (September, 1941). 

In the calendar year, 1943, the Re- 
earch Bulletin will be issued four times 
(February, April, October, and Decem- 
ber) rather than five times as in previous 
years 


Yearbooks.—The Division helped in 
the recently completed 1943 volume of 
the American Association of School Ad- 
ministrators. Under the topic, Schools 
and Manpower, the yearbook deals essen- 
tially with the problem of education for 
economic competence. The 1944 volume 


now in process will deal with the gen. 
eral problem of morale. 

In cooperation with the Department 
of Elementary School Principals, the 
Division is helping to develop the 1943 
yearbook on the topic, citizenship train. 
ing at the elementary-school level. The 
volume will be available in the summer 
of 1943. The 1944 yearbook will deal 
with creative teaching and learning in 
the elementary school. 


Committees and Departments.—In co 
operation with the N. E. A. Committee 
on Tenure, the Division is surveying 
tenure and contractual procedures in the 
faculty employment of normal schools 
and teachers colleges. The report is ex- 
pected in June, 1943. A summary has 
been made of court decisions on tenure 
for the calendar year, 1942, which wil! 
be published in April, 1943. 

A study is being made of the benefits 
under federal social security, as com- 
pared with the benefits under various 
State teacher-retirement systems. It is 
planned to have this material ready for 
distribution by next summer. 

Bibliographic, editorial, and consulta- 
tive services are being supplied to the 
Committee on International Relations 
(particularly the inter-American news- 
letter, Among Us), the Committee on 
Equal Opportunity, the National Com- 
mission on the Defense of Democracy 
through Education, the Committee on 
Cooperatives, the Committee on Teacher 
Preparation and Certification, the Com 
mittee on Academic Freedom, and the 
Committee on Tax Education and Schoo! 
Finance. 

A statistical summary of libraries in 
teachers colleges has been prepared for 
the American Association of Teachers 








¢ gen- 


irtment 
S, the 
> 1943 
train- 

The 
immer 
1 deal 
ng in 


in co 
mittee 
eying 
n the 
hools 
S ¢x- 
y has 
enure 


wil! 


nefits 
com- 
rious 
t is 
+ for 


ulta- 
the 
i0ns 


Cws- 





943} RESEARCH NEWS AND COMMUNICATIONS 557 


Colleges, a department of the National 
1 Association. 

Projects.—In cooperation with 

rican Association of School Ad- 

rators, through the Educational 

) Service, the Division is prepar- 

ulars on school wartime activities, 

schedules in city-school systems, 

summaries of educational articles in 
gazines. 

With the aid of funds supplied by the 
Highway Education Board, the Division 
will soon issue a mimeographed check- 

n school safety in wartime. This re- 

is a wartime interpretation of an 

r checklist on safety and safety edu- 

n issued through the Safety Educa- 
Projects of the Division. 

With the help of a Canadian teacher 

Division will issue in a few weeks 

sample units on Canada suitable 
use in elementary schools. 

Bibliographies—The Division is work- 
1g continuously on the preparation and 
revision of numerous bibliographies used 

its Information Section. Definite 
progress has been made on a revision of 
: large bibliography on the Far East and 

n the preparation of a reference list on 

twar planning. 

Government Relationships.—The Divi- 

1 participated in conferences that led 

the publication last fall of the hand- 


book, School Transportation in Wartime. 


This report has been of help to the Office 
{ Defense Transportation. 


An instructional handbook on coun- 
feiting has been prepared for the U. S. 
Secret Service. It should be published in 
the spring of 1943. 
rogress has been made on a statement 
1¢ scope of education for the Bureau 
f Labor Statistics. 


Reciprocal relationships have been 
carried on with O. P. A., the Bureau of 
the Census, the Social Security Board, 
the Office of Education, and various 
other: federal agencies. 


Information Releases —Much time has 
been given this year to the production 
of brief releases on various topics of 
current interest. These have included re- 
leases on salary trends (Schools and 
Current Economic Trends), tax educa- 
tion and school finance, social security, 
manpower, and federal legislation. 


In the field of state legislation the re- 
lease, Aids to Bill Drafting, and the sum- 
mary of school legislation are particu- 
larly noteworthy. 


Letters of Inquiry——The Division con- 
tinues to answer hundreds of letters on 
various individual and school problems. 
These letters often require the prepara- 
tion of bibliographies, consultation with 
government officials, and compilations of 
research data. An especially heavy de- 
mand has been met for datz on salaries 
and salary scheduling in city school 
systems. 

Comment.—More than a decade ago 
the Division called attention to the fact 
that it was impossible to develop poli- 
cies and procedures with respect to 
teacher supply and demand problems, as 
long as official state statistics were not 
compiled systematically. This same prob- 
lem of inadequacy in basic statistical 
data is an impediment in dozens of other 
areas of education. It is impossible to 
obtain a complete statistical picture of 
school systems in the United States. Too 
frequently it is necessary to make esti- 
mates on the basis of incomplete and 
often inaccurate reports. 


7% 


558 JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, No.7 


Sometimes administrators complain 
that too many questionnaires are being 
circulated. This is true, but the number 
would be reduced, if competent agencies 
were able to present summaries that 
would be recognized as basic and reliable 
sources. The great need is first to ap- 
point in local and state school offices 
enough persons assigned to prepare sta- 
tistical reports; and second, to give these 
persons freedom to work upon research 
and not to divert their energies to ad- 
ministrative tasks. 

Without this statistical base, much 
policy-making and many administrative 
practices must be by “rule of thumb.” 
Research takes time and costs money, but 
at the critical point when decisions must 
be made it pays rich dividends to those 
who need the facts when they want 
them 

This summary was prepared by Frank 
W. Hubbard, director of the Research 


Division 


Educational Policies Commission. 
The Commission in 1942 published three 
brief documents on the war program and 
the schools. These were entitled A War 
Policy for American Schools, The Sup- 

rt of Education in Wartime, and What 

+ Schools Should Teach in Wartime 

Also in 1942 the Commission pub- 
lished Free Men, a pageant based on its 
earlier report, The Education of Free 
Men in American Democracy. This 
pageant was prepared by the Music Edu- 
cators National Conference in coopera- 
tion with the Educational Policies Com- 
mission, and was produced at a biannual 
convention of the Music Educators at 
Milwaukee in April, 1942. Since then 
the script has been generally available 
and the pageant has been widely pre- 


sented to audiences throughout the 
country. 

In 1942 the Commission inaugurate 
its series of radio programs under the 
general title of The National Teacher 
Meeting by Radio. Plans are now being 
laid for one or more National Parent- 
Teacher Meetings by Radio. In the firs: 
of the National Teachers Meetings by 
Radio, over 80,000 teachers in definitely 
Organized groups were tuned in, and 
each of these organized groups followed 
up the radio presentation with a discus- 
sion and application to their own local 
situation. In addition, many members of 
the public and many individual teachers 
not part of organized listening groups 
attend these radio meetings. The Com. 
mission plans to continue this activity 
1943, 

The Commission has two major 
ports in preparation for 1943. Both 
with the role of education after the war 
One of the documents is concerned 
marily with the reorganization of se 
ondary education, so that it may effec. 
tively serve all the youth of the United 
States. The center of attention will be 
the age group from 17 to 21, although 
some attention will be given to the pre 
ceding and succeeding educational pr 
grams. The report is to consist largely 
of a series of descriptions of hypothet 
ical, but representative, American co 
munities, giving in some detail the type 
of educational program that ought to 
function in these communities, the cur- 
riculum, the qualificaitons of the teach 
ers, the plant and facilities, the relations 
to the community and the various or 
ganized agencies in the community, the 
relation between the educational program 
and work experience, the organization of 
school districts, and the responsibilities 





“t 
5 beif 
wt 
KNOW 
by 

C 

. t ble 

i 

> 

pe 

: I 
Wi 

, 

r Vv 

© 

Y 

¢ 

> 

r 

> i 

o 

im 

~ f 

' 
> 

ov . 

: ‘ 

A 

g 

J rri 

4 

i 

: xpet 

7 rapid 
di 
or 

; 

%, 


y 








lugurate 


nder the 


Teacher 
Ww being 
Parent- 
the first 
lings by 
lefinitely 
in, an 
‘Ollowed 
discus 
m loca 
ibers of 


teachers 


teach 
ations 
Ss or 
, the 
gram 
on of 
ilities 





el 


xt 


Kr 


¥ 


I 


2 


rapidly 


943} 


RESEARCH NEWS AND COMMUNICATIONS 


state, and federal agencies in 
tion with education. 


he second report will deal with the 
f education as an international in- 
ent to help secure the peace and 
1d the democracy for which the war 
being fought. It is planned that this 


nent will 


should govern 


lization of 


describe 
the 
education. 


the principles 


international 
It is well 


wn that education has in the past 
allowed to function in such a way 
promote the conditions which lead 


ternational 


conflict. 


In this report 


Commission will address itself to the 


blem of reversing this historic trend 


f making education become an in- 
ent for the development of a more 
peace and a wider democracy for 


people 


progress report was prepared by 


m G. Carr 


Y ear 
W. Tyler, 


St 


udy.—As 


implementation 


analyzed by 
of the 


ngs of the Eight-Year Study involves 
sues that seem critical: 


The 


provision of a more rational 


f college-entrance requirements. The 


t-Year 


Study 


showed 


that specific 


and unit requirements are not as 


factory as requirements based on the 


ties essential for college work. 


The development of better articu- 
in between secondary school and col- 
The nature of the most serious dis- 
tions were identified in the Follow- 
Study. They involve changes both in 
rriculum and guidance programs. 


The wider 


growing 


spread 


of curriculum 
experimentation. The war may stop the 
experiments 


that are 


eded to provide an adequate basis for 
ondary-school improvements. 


559 


4. The development of more careful 


and more comprehensive appraisal of 
both curriculum and _ instruction. The 
Eight-Year Study developed methods 


that may help to improve evaluation, if 
not stopped by the war. 


Evaluation in 4-H Cotton Work.— 
The early attitude that scientific informa- 
tion is something for the college profes 
sor of agriculture and has no usefulness 
to the practical dirt farmer is fast pass- 
ing out of the picture. The younger gen- 
eration has a high regard for the practical 
value of scientific information relating to 
agriculture. Farm boys tested in Arkan- 
sas felt that a farmer can get “much” 
practical help in running his farm, from 
scientific information about crops, 
and farm animals. 


soils, 


The appreciation of scientific informa- 
tion is being taught to 4-H Club mem- 
bers by the county agents at meetings 
and through visits to the agricultural ex- 
periment stations. Members of a 4-H 
Club were tested at the beginning and 
end of their year’s work. It was found 
that their appreciation of scientific infor- 
mation increased from 2.13 to 2.28 on a 
scale where 3.00 was “very much’, 2.00 
“much”, 1.00 “some’’, and 0.00 “none’’. 


An equivalent group of 184 nonmem- 
bers of 4-H Clubs having the same aver- 
age appreciation of 2.13 points at the 
beginning of the year became less appre- 
ciative by the end of the year, when 
their average score was only 1.96 points 

When the same two groups were asked 
where they could get the best informa- 
tion about growing cotton, they wrote 
their replies in a space provided on the 
test form. The analysis of their replies 
showed that 22 percent of the 4-H mem- 
bers and only 5 percent of the non- 


560 JOURNAL OF EDUCATIONAL RESEARCH [Vol. 36, No.7 


nembers mentioned the agricultural 
experiment station. 

The test also included attitude ques- 
tions on soil-erosion control. Both groups 
became considerably more favorable, but 
the difference in gains in favor of the 
1-H group was not statistically signifi- 
cant 

Although the 4-H group gained but 
little more than the non-4-H group in 
cotton economic information, the 4-H 
group gained much more in _ cotton- 
growing information. Members of the 
i-H cotton club were taught very little 
cotton-economic information during the 
period of the study. Much attention was 
given to information relating to the 
growing of cotton on their projects. 
The report of the study is 

Fred P. Frutchey, W. J. Jernigan, and 
W. M. Cooper, Evaluation in 4-H Cotton 
Demonstration, Arkansas, 1940. Exten- 
sion Service Circular 391, U. S. Depart- 
ment of Agriculture, October, 1942. 


Psychology Monographs—Stanford 
University Press is the publisher for a 
new series of monographs sponsored by 
the American Association for Applied 
Psychology, under the general editorship 
of Herbert S. Conrad, Institute of Child 
Welfare, University of California. Asso- 
ciate editors of the series are Robert G. 
Bernreuter, E. A. Doll, Arthur W. Korn- 
hauser, Harriet E. O'Shea, C. R. Rogers, 
F. L. Ruch, P. M. Symonds, and Lewis 
M. Terman. 

The first monograph, scheduled for 
publication in March, 1943, is Exploring 


the Wartime Morale of High School 
Youth, by Lee J. Cronbach of Washing. 
ton State College. All monographs will 
be reproduced by photolithography from 
typewriter type and will be paper bound, 
A special discount is given on continua- 
tion orders for all monographs as they 
are published. 


National Society for the Study of 
Education —For a number of years the 
Public School Publishing Company has 
been the agent of the Society for the sale 
of the yearbooks. The Board of Directors 
has now deemed it advisable to make 
other arrangements for the sale of the 
yearbooks, in order that the services in 
volved in properly distributing the year 
books may be more closely co-ordinated 
with the services of the yearbook com- 
mittees and the other activities of the 
Society which have always been super- 
vised by the Board. Inasmuch as the 
office of the Society was moved to the 
University of Chicago after the death of 
Guy M. Whipple in 1941, arrangements 
have been made for the sale of the Soci- 
ety’s yearbooks through the publications 
office of the Department of Education of 
this University. Under this arrangement 
the policies and procedures governing 
the distribution of the yearbooks can be 
directed by the Board. Members of the 
Society are requested to assist in the pro- 
motion of yearbook sales by informing 
friends and acquaintances of this change 
in the agency. Orders may be sent to the 
Department of Education of the Univer- 
sity of Chicago or to the office of the 
National Society 











